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fonner clinical studies. There are still a great number of patients who will not benefit from a 
systemic chemotherapy. Especially, breast cancers are very heterogeneous in their aggressiveness 
and treatment response. They contain different genetic mutations and variations affecting growths 
characteristic and sensitivity to several drugs. Identification of each tumor's molecular fingerprint, 
• 5 then, could help to segregate patients who have particularly aggressive tumors or who need to be 
treated with specific beneficial therapies. As research involving genetics and associated responses 
to treatment matures, standard practice will undoubtedly become more individualized, enabling 
physicians to provide specific treatment regimens matched with a tumor's genetic profiles to 
ensure optimal outcomes. 

10 SUMMARY OF THE TNVP.NTTOM 

The present invention relates to the identification of 185 human genes being differentially 
expressed in neoplastic tissue resulting in an altered clinical behavior of a neoplastic lesion. The 
differential expression of these 185 genes is not limited to a specific neoplastic lesion in a certain 
tissue of the human body. 

15 In preferred embodiments of this invention the neoplastic lesion, of which these 185 genes are 
altered in their expression is a cancer of the human breast. This cancer is not limited to females 
and may also be diagnosed and analyzed in males. 

The invention relates to various methods, reagents and kits for diagnosing, staging, prognosis, 
monitoring and therapy of breast cancer. "Breast cancer" as used herein includes carcinomas, (e.g.,' 
carcinoma in situ, invasive carcinoma, metastatic carcinoma) and pre-malignant conditions, 
neomorphic changes independent of their histological origin (e.g. ductal, lobular, medullary, mixed 
origin). The compositions, methods, and kits of the present invention comprise comparing the level 
of mRNA expression of a single or plurality (e.g. 2, 5, 10, or 50 or more) of genes (hereinafter 
"marker genes", listed in Table la and lb, SEQ ID NO:l to 165 and 472 to 491, the respective 
25 polypeptide sequences coded by them are numerated SEQ ED NO: 166 to 330 and 492 to 51 1, see 
also Table la and lb) in a patient sample, and the average level of expression of the marker 
gene(s) in a sample from a control subject (e.g., a human subject without breast cancer). A 
preferred sub-set of marker genes representing a specific test composition or kit is listed in Table 
2. 
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The invention relates further to various compositions, methods, reagents and kits, for prediction of 
clinically measurable tumor therapy response to a given breast cancer therapy. The compositions, 
methods, and kits of the present invention comprise comparing the level of mRNA expression of a 
single or plurality (e.g. 2, 5, 10, or 50 or more) of breast cancer marker genes in an unclassified 
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In one embodiment of the compositions, methods, reagents and kits of the present invention, the 
sample to be analyzed is tissue material from neoplastic lesion taken by aspiration or punctuation, 
excision or by any other surgical method leading to biopsy or resected cellular material. In one 
embodiment of the compositions, methods, and kits of the present invention* the sample comprises 
cells obtained from the patient. The cells may be found in a breast cell "smear" collected, for 
example, by a nipple aspiration, ductal lavarge, fine needle, biopsy or "from provoked or 
spontaneous nipple discharge. In another embodiment, the sample is a body fluid. Such fluids 
include, for example, blood fluids, lymph, ascitic fluids, gynecological fluids, or urine but not' 
limited to these fluids. 

In accordance with the compositions, methods, and kits of the present invention the determination 
of gene expression is not limited to any specific method or to the detection of mRNA. The 
presence and/or level of expression of the marker gene in a sample can be assessed, for example, 
by measuring and/or quantifying of: 

1) a protein encoded by the marker gene in Table la and lb (SEQ ID NO:l to 165 and 472 to 
491)or a polypeptide comprising a polypeptide selected from SEQ ID NO: 166 to 330 and 
492 to 511 or a polypeptide resulting from processing or degradation of the protein (e.g. 
using a reagent, such as an antibody, an antibody derivative, or an antibody fragment, 

■ which binds specifically with the protein or polypeptide) 

2) a metabolite which is produced directly (i.e., catalyzed) or indirectly by a protein encoded 
by the marker gene in Table la and lb (SEQ ID NO:l to 165 and 472 -to 491)or by a 
polypeptide comprising a polypeptide selected from SEQ ID NO: 166 to 330 and 492 to 
511 

3) a RNA transcript (e.g., mRNA, hnRNA) encoded by the marker gene in Table la and lb, 
or a fragment of the RNA transcript (e.g. by contacting a mixture of RNA transcripts 
obtained from the sample or cDNA prepared from the transcripts with a substrate having 
nucleic acid comprising a sequence of one or more of the marker genes listed within Table 
la and lb fixed thereto at selected positions). The mRNA expression of these genes can be 
detected e.g. with DNA-microarrays as .provided by Affymetrix Inc. or other manu- 
facturers. U.S. Pat. No. 5,556,752. In a further embodiment the expression of these genes 
can be detected with bead based direct fluorescent readout techniques such as provided by 
Luminex Inc. PCT No. WO 97/14028. 

In one aspect, the present invention provides a composition, method, and kit of assessing whether a 
patient is afflicted with breast cancer (e.g., new detection or "screening", detection of recurrence, 
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a) expression of a single or plurality of marker genes in a first t sample obtained from the 
patient prior to any treatment of the patient, wherein at least one of the marker genes is 
selected from the marker genes listed within Table la and lb and 

b) expression of the marker gene in a second sample obtained from the patient following at 
5 least one dose of the therapy. 

It will be appreciated that in this composition, method, and kit the "therapy" may be any therapy 
for treating breast cancer including, but not limited to, chemotherapy, anti-hormonal therapy, 
directed antibody therapy, radiation therapy and surgical removal of tissue, e.g., a breast tumor. 
Thus, the compositions, methods, and kits of the invention may be used to evaluate a patient 
10 before, during and after therapy, for example, to evaluate the reduction in tumor burden. 

In a further aspect, the present invention provides a composition, method, and kit for monitoring 
the progression of breast cancer in a patient. This composition, method, and kit comprising: 

a) . . detecting in a patient sample at a first time point, the expression of a single or plurality of 

marker genes; wherein at least one of the marker genes is selected from the marker genes 
15 listed in Table la and lb 

b) repeating step a) at a subsequent time point in time; and 

c) comparing the level of expression of each marker gene detected in steps a) and b), and 
therefrom monitoring the progression of breast cancer in the patient. 

La another aspect, the invention provides a composition, method, and kit for in vitro selection of a 
20 therapy regime (e.g. the kind of chemotherapeutical argents) for inhibiting breast cancer in a 
patient. This composition, method, and kit comprises the steps of: 

a) obtaining a sample comprising cancer cells from the patient; 

b) separately maintaining aliquots of the sample in the presence of a diverse test 
compositions; 

25 c) . comparing expression of a single or plurality of marker genes, selected from the marker 
genes listed in Table la and lb; 

in each of the aliquots; and 

d) . . selecting one of the test compositions which induces a lower level of expression of genes 

from SEQ ID 11, 17, 22, 25, 31, 36, 48, 49, 57, 83, 107, 108, 112, and 159 and/or a higher 
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from SEQ ID NO: 166 to 330 and 492 to 511. A test compound is contacted with the particular 
polypeptide. A biological activity mediated by the polypeptide is detected. A test compound which 
decreases the biological activity is thereby identified as a potential therapeutic agent for decreasing 
the activity of the particular polypeptide in malignant neoplasia and especially in breast cancer A 
5 test compound which increases the biological activity is thereby identified as a potential thera- 
peutic agent for increasing" the activity of the particular polypeptide in malignant neoplasia and 
especially in breast cancer 

The invention thus provides polypeptides selected from one of the polypeptides with SEQ ID NO: 
166 to 330 and 492 to 511 which can be used to identify compounds which may act, for example, 

10 as regulators or modulators such as agonists and antagonists, partial agonists, inverse agonists, 
activators, co-activators and inhibitors of the polypeptide comprising a polypeptide selected from 
SEQ ID NO: 166 to 330 and 492 to 511 Accordingly, the invention provides reagents and 
compositions, methods, and kits for regulating a polypeptide comprising a polypeptide selected 
from SEQ ID NO: 166 to 330 and 492 to 51 1 in malignant neoplasia and more particularly breast 

15 cancer. The regulation can be an up- or down regulation. Reagents that modulate the expression, 
stability or amount of a polynucleotide listed in Table la and lb (SEQ ID NO: 1 to 165 and 472 to 

491 or the activity of the polypeptide comprising a polypeptide selected from SEQ ID NO: 166 to 
330 and 492 to 511 can be a protein, a peptide, a peptidomimetic, a nucleic acid, a nucleic acid 
analogue (e.g. peptide nucleic acid, locked nucleic acid) or a small molecule. Compositions, 

20 methods, and kits that modulate the expression, stability or amount of a polynucleotide comprising 
a polynucleotide selected from SEQ ID NO: 1 to 165 and 472 to 491 (listed in Table la and lb) or 
the activity of the polypeptide comprising a polypeptide selected from SEQ ID NO: 166 to 330 and 

492 to 511 (Tablel) can be gene replacement therapies, antisense, ribozyme and triplex nucleic 
acid approaches. 

25 The invention further provides a composition, method, and kit of making an isolated hybridoma 
which produces an antibody useful for assessing whether a patient is afflicted with breast cancer. 
The composition, method, and kit comprises isolating a protein encoded by a marker gene listed 
within Table la and lb or a polypeptide fragment of the protein, immunizing a mammal using the 
isolated protein or polypeptide fragment, isolating splenocytes from the immunized mammal, 
fusing the isolated splenocytes with an immortalized cell line to form hybridomas, and screening 
individual hybridomas for production of an antibody which specifically binds with the protein or 
polypeptide fragment to isolate the hybridoma. The invention also includes an antibody produced 
by this method. Such antibodies specifically bind to a full-length or partial polypeptide comprising 
a polypeptide selected from SEQ ED NO: 166 to 330 and 492 to 5 1 1 (listed in Table la and lb) for 
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use in prediction, prevention, diagnosis, prognosis and treatment of malignant neoplasia and ore* 
cancer in particular. 

Yet another embodiment of the invention is the use of a reagent which specifically binds to 
polynucleotide comprising a polynucleotide selected from SEQ ID NO: 1 to 165 and 472 to 491o 
to a polypeptide comprising a polypeptide selected from SEQ ID NO: 166 to 330 and 492 to 51 
(listed in Table la and lb)in the preparation of a medicament for the treatment of maligna* 
neoplasia and breast cancer in particular. 

Still another embodiment is the use of a- reagent that modulates the activity or stability of 
polypeptide comprising a polypeptide selected from SEQ ID NO: 166 to 330 and 492 to 51 
(Table la and lb) or the expression, amount or stability of a polynucleotide comprising , 
polynucleotide selected from SEQ ID NO: 1 to 165 and 472 to 491 (Table la and lb) in th. 
preparation of a medicament for the treatment of malignant neoplasia and breast cancer h 
particular. 

Still another embodiment of the invention is a pharmaceutical composition which includes , 
reagent which specifically binds to a polynucleotide comprising a polynucleotide selected fron 
SEQ ID NO: 1 tol65 (Tablel) or a polypeptide comprising a polypeptide selected from SEQ IT 
NO: 1 66 to 300 , and a pharmaceutically acceptable carrier. 

A further embodiment of the invention is a pharmaceutical composition comprising a poly 
nucleotide including a sequence which hybridizes under stringent conditions to a polynucleotide 
comprising a polynucleotide selected from SEQ ID NO: I to 165 and 472 to 491 and encoding t 
polypeptide exhibiting the same biological function as given for the respective polynucleotide ir 
Table la and lb or 4, or encoding a polypeptide comprising a polypeptide selected from SEQ IE 
NO: 166 to 330 and 492 to 51 1. Pharmaceutical compositions, usefulin the present invention ma 3 
further include fusion proteins comprising a polypeptide comprising a polynucleotide selectee 
from SEQ ID NO: 1 to 165 and 472 to 491, or a fragment thereof, antibodies, or antibody 
fragments 

The invention also provides various kits. Such kit comprises reagents for assessing expression of £ 
single or a plurality of genes selected from the marker genes listed in Table la and lb or selectee 
from the sub-set of genes listed in Table 2. • 

In one aspect, the invention provides a kit for assessing whether a patient is afflicted with breas 
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In another aspect, the invention provides a kit for assessing the suitability of each of a plurality of 
compounds for inhibiting a breast cancer in a patient The kit comprises reagents for assessing 
express.cn of a marker gene listed within Table la and lb, or reagents for assessing the expression 
of each marker gene of a marker gene set listed in Table 2. The kit may also comprise a plurality of 
compounds. 



cancer 



In an additional aspect, the invention provides a kit for assessing the presence of breast 
cells. This kit comprises an antibody, wherein the antibody binds specifically with a protein 
encoded by a marker gene listed within Table la and lb or polypeptide fragment of the protein 
The kit may also comprise a plurality of antibodies, wherein the plurality binds specifically with- 
the protein encoded by each marker gene of a marker gene set listed in Table 2. 

In yet another aspect, the invention provides a kit for assessing the presence of breast cancer cells 
wherein the kit comprises a nucleic acid probe. The probe hybridizes specifically with a RNA 
transcript of a marker gene listed within Table la and lb or cDNA of the transcript. The kit may 
also comprise a plurality of probes, wherein each of the probes hybridizes specifically with a RNA 
transcript of one of the marker genes of a marker gene set listed in Table 2. 

It will be appreciated that the compositions, methods, and kits of the present invention may also 
mclude known cancer marker genes including known breast cancer marker genes. It will further be 
appreciated that the compositions, methods, and kits may be used to identify cancers other than 
breast cancer. 

DETAILED DESCRIPTION OF TRF. TNVPisrrmM 
DEFINITIONS 

"Differential expression", or "expression" as used herein, refers to both quantitative as well as 
qualitative differences in the genes' expression patterns depending on differential development 
different genetic background of tumor cells and/or reaction to the tissue environment of the tumor.' 
Differentially expressed genes may represent "marker genes," and/or "target genes" The 
expression pattern of a differentially expressed gene disclosed herein may be utilized as part of a 
prognostic or diagnostic breast cancer evaluation., Alternatively, a differentially expressed gene 
disclosed herein may be used in methods for identifying reagents and compounds and uses of these 
reagents and compounds for the treatment of breast cancer as well as methods of treatment. The 
differential regulation of the gene is not limited to a specific cancer cell type or clone, but rather 
displays the interplay of cancer cells, muscle cells, stromal cells, connective tissue cells, other 
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epithelial cells, endothelial cells and blood vessesl as well as cells of the immune system (e.j 
lymphocytes, macrophages, killer cells). 

"Biological activity" or "bioactivity" or "activity" or "biological function", which are use 
interchangeably, herein mean an effector or antigenic function that is directly or indirect! 
performed by a polypeptide (whether in its native or denatured conformation), or by any fragmer 
thereof in vivo or in vitro. Biological activities - include but are not limited to binding t 
polypeptides, binding to other proteins or molecules, enzymatic activity, signal transduction 
activity as a DNA binding protein, as a transcription regulator, ability to bind damaged DNA etc 
A bioactivity can be modulated by directly affecting the subject polypeptide. Alternatively 
bioactivity can be altered by modulating the level of the polypeptide, such as by modulatin. 
expression of the corresponding gene. 

The term "marker" or "biomarker" refers a biological molecule, e.g., a nucleic acid, peptide 
hormone, etc., whose presence or concentration can be detected and correlated with a know, 
condition, such as a disease state. 

The term "marker gene," as used herein, refers to a differentially expressed gene which express*, 
pattern may be utilized as part of predictive, prognostic or diagnostic process in malignan 
neoplasm or breast cancer evaluation, or which, alternatively, may be used in methods fo 
identifying compounds useful for the treatment or prevention of malignant neoplasia and breas 
cancer in particular. A marker gene may also have the characteristics of a target gene. 

"Target gene", as used herein, refers to a differentially expressed gene involved in breast cancer i 
a manner by which modulation of the level of target gene expression or of target gene produc 
activity may act to ameliorate symptoms of malignant neoplasia and breast cancer in particular, i 
target gene may also have the characteristics of a marker gene. 

The term "neoplastic lesion" or - neoplastic disease" or "neoplasia" refers to a cancerous tissu 
this includes carcinomas, (e.g., carcinoma in situ, invasive carcinoma, metastatic carcinoma) an. 
pre-malignant conditions, neomorphic changes independent of their histological origin (e.g ductai 
lobular, medullary, mixed origin). The term "cancer" is not limited to any stage, grade 
histomorphological feature, invasiveness, agressivity or malignancie of an affected tissue or eel 
aggregation. In particular stage 0 breast cancer, stage I breast cancer, stage n breast cancer stag 
EI breast cancer, stage IV breast cancer, grade I breast cancer, grade H breast cancer, grade D 
breast cancer, malignant breast cancer, primary carcinomas of the breast, and all other types o 
cancers, malignancies and transformations associated with the breast are included. The term 
"neoplastic lesion" or " neoplastic disease" or "neoplasia" or "cancer" are not limited to any tissu 



BHC03 1 001 .01 



12 



or cell type ,hey a!so inc.ude primly, or metastatic teio „ , f ^ 

compnses lymphnodes affected by eancer cells or minimal residua. disease M „ s either 
deposed (eg. bone marrow, liver, kidney) or freely floating throughout the patienta body. 

The tern, "bio.og.ca, S amp,e", as usen herein, refers to a sample obtained from an organism or 
5 from components (e.g., oells, of an organism. The sample may be of any biologiea. tissue or fluid 
Frequently the sample wUl be a "Cinica, sample" which is a sample derived from a patient. Such 
samples mc.ude, bu, are no, limited to. sputum, blood, blood cetls (e.g., white cells), tissue or fine 
needle b opsy samp.es, cell-containing bodyfluids. free floating nucleic acids, urine, peritonea, 
fled, and pleural fluid, or cells .herefrom Biological samples may also include sections of tissues 
10 such as frozen or fixed sections Wren for histological purple, A biologica! sample to be analyzed 
■s .tssue materia, from neoplastic lesion taken by aspiration or punctuation, excision or by any 
outer surgrca, me**, leading to biopsy or „ ^ _ ^ 

compnses cells obtained from a patient The cells may be found in a breas, oel, -smear" collected 
for example, by a nipple aspirin, ductal Wge _ ^ ^ ^ ^ ^ • 

spontaneous nipple discharge, m another embodiment, the sample to a body fluid. Such fluids 
mclude for example, blood fluids, tymph, ascitic fluid,, gynecological fluids, or urine bu, not 
limited to these fluids. 

The term "therapy modality", "therapy mo de». "regimen" of "chemo regimen" as well as -merapy 

20 71, I ' T* "~ " 5 " nU " aneOUS — <*- and/or immune 

20 snmulanng, and/or blood cel. pmlifereuve ag=n te . and/or sedation merapy, and/or hypermermi, 

and/or hypomermia for cancer therapy. The administration of these can be performed in an 
adjuvant and/or neoadjuvant mode. The composition of such "protocol" may vary in dose of the 
stogie agent, timeframe of application and frequency of adnurusfration within a defined merapy 
wtndow. Currently various combinations of various drugs and/or physic, methods, and varioul 
25 schedules are under investigation. 

By "stray" or "matrix" is « an arrangement of addressable locadons or "addresses" on a 
devrce. The localions can be arranged in two dirmmsiona, arrays, three dimenaiona, arrays, or other 
matrnc fonnata. The number of locaHona can range from several ,„ „ least hlm<ireds of 
Most nnportantly, each location represents a totofly independent reactton site. Arrays include but 
30 are no, hmrted ,o nucleic acid arrays, protoin arrays and antibody arrays. A "nucleic acid array- 
refers to an array containing nucleic acid prebes, such as oligonucleotides, polynucleotides or 
larger portions of genes. The nucleic acid on me ra; is preferably single stranded. Arrays 
wherem the probes are oligonucleotides are referred to as -oligonucleoude amtys" or 
ohgonucleotide chips." A "microarray - herein also refers to a -biochip" or "biologica! chip" an 
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array.of regions having a density of discrete regions of at least about .100/cm 2 , and preferably • 
least about 1000/cm 2 . The regions in a microarray have typical dimensions, e.g., diameters in ft 
range of between about 10-250 m and are separated from other regions in the array by about th 
same d,stance. A "protein array" refers to an array containing polypeptide probes or protein pro* 
wh,ch can be in native form or denatured. An "antibody array" refers to an array containin 
antibodies which include but are not limited to monoclonal antibodies (e.g. from a mouse 
chimeric antibodies, humanized antibodies or phage antibodies and single chain antibodies as we 
as fragments from antibodies. 

The term "agonist", as used herein, is meant to refer to an agent that mimics or upregulates (e g 
potentiates or supplements) the bioactivity of a protein. An agonist can be a wild-type protein c 
denvative thereof having at least one bioactivity of the wild-type protein. An agonist can also be 
compound that upregulates expression of a gene or which increases at least one bioactivity of 
protem. An agonist can also be a compound which increases the interaction of a polypeptide wit 
another molecule, e.g., a target peptide or nucleic acid. 

The term "antagonist" as used herein is meant to refer to an agent that downregulates (e g 
suppresses or inhibits) at least one bioactivity of a protein. An antagonist can be a compoun 
which mhibits or decreases the interaction between a protein and another molecule e g a targe 
peptide, a ligand or an enzyme substrate. An antagonist can also be a compound th, 
downregulates expression of a gene or which reduces the amount of expressed protein present. 

"Small molecule" as used herein, is meant to refer to a composition, which has a molecular weigl 
of less than about 5 kD and most preferably less than about 4 IcD. Small molecules can be nuclei 
acds, peptides, polypeptides, peptidomimetics, carbohydrates, lipids or other organic (carbo* 
containing) or inorganic molecules. Many pharmaceutical companies have extensive libraries c 
chermcal and/or biological mixtures, often fungal, bacterial, or algal extracts, which can b 
screened with any of the assays of the invention to identify compounds that modulate a bioactivity 

The terms "modulated" or "modulation" or "regulated" or "regulation" and "differential! 
regulated" as used herein refer to both upregulation (i.e., activation or stimulation (eg b 
agomzmg or potentiating) and down regulation [i.e., inhibition or suppression (e.g.,' b 
antagonizing, decreasing or inhibiting)]. 

'Transcriptional regulatory unit" refers to DNA sequences, such as initiation signals, enhancer 
and promoters, which induce or control transcription of protein coding sequences with which the 
are operably linked. In preferred embodiments, transcription of one of the genes is under th 
control of a promoter sequence (or other transcriptional regulatory sequence) which controls th 
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express™ of toe recombinant gene in , oelMype in which exprassion.is intended. It win also be 
understood tat the recombinant gene can be under tbe control of transcription* reguUtory 
sequences which are too same or which are different from .hose sequences which con.ro! 
transenphon of the naturally occurring forms of the polypeptide. 

The tenn privative" refers to ,he chemica. modification of a polypeptide sequence, or a 
polynucleotide sequence. Chemical modifications of a -polynucleotide sequence can include for 
example, replacement of hydrogen by an alkyl. acy,, or amino group. A derivative po.vnucleo„de 
encodes a po.ypepude which retains a, .eas. one biological or immunological function of .he 
nature, mo.ecule. A derivative polypeptide is one modified by glycosyUtion, pegy.a„o„, or any 
strndar process ma. retains a. .eas. one biologica. or mununo.ogfca, ^ 
from which it was derived. 

The term "nuc.eotide anaiog" refers to ohgomers orpolymera being a. teas, to one featora differan. 
from namrally occurring nucleotides, • oligonucleotides or polynucleotides, but exhibiting 
functional featiues of toe respective natirratiy „ccumng nucIeotides (e g ^ 
hybridization, coding information) and that can be used for said compositions. The nuc.eo.ide 

LNAs, PNAs and Mon.ho.ino, The nucleotide »na.og has a. teas, one mo.eeu.e differs from its 
naturally occurring counterpart or equivalent. 

"BREAST CANCER GENES" or "BREAST CANCER GENE" as used herein refers to the 
polynucleotides of SEQ ID NO.lto 165 and 472 to 491 (listed in Table la and lb) as well as 
denvatrves, fragments, analogs and homology thereof, the polypeptides encoded thereby (SEQ 
ID NO:166 to 330 and 492 to 511, see Tablel) as well as derivatives, fragments, analogs and 
homology thereof and the corresponding genomic transcription units which can be derived or 
•denhfied with standard techniques well known in the art using the information disclosed in Tables 
1 to 5. The Genename, Reference Sequence, unique Genelidentifier, and the Locuslink ID numbers 
of the polynucleotide sequences of the SEQ ID NO: 1 to 65 and the polypeptides of the SEQ ID 
NO: 166 to 330 and 492 to 51 1 are shown in Table la and lb, the gene description, gene function 
and subcellelar localization is given in Tables 4a and 4b. 

The term "chromosomal region" as used herein refers to a consecutive DNA stretch on a 
chromosome which can be defined by cytogenetic or other genetic markers such as e.g. restriction 
length polymorphisms (RFLPs), single nucleotide polymorphisms (SNPs), expressed sequence tags 
(ESTs), sequence tagged sites (STSs), microsatellites, variable number of tandem repeats (VNTRs) 
and genes. Typically a chromosomal region consists of up to 2 Megabases (MB), up to 4 MB up 
to 6 MB, up to 8 MB, up to 10 MB, up to 20 MB or even more MB. 
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The term lot as used herein refers to any manufacture {e.g. a diagnostic or resea k , 
comprising at least one reagent e « a orobe for < •«- , \ ° r ™ xmh 

* P "**■ for detecting the expression of at le- 

one marirer gene disclosed in me invention, „ wuMa „ ^ ^ ^ > " * ^ 

*e manuftomre ia being so,d, distributed, and/or prom 0te d as a un it for performing * me^T 
me present mvention The genes, primer and probes Hated in Tabic 2 and 5 or any obrnrn^ 
e. ,eas, two of mem, regard aa one aingie tea, for the purpose,, melhods 
mvennon^so reagents (e.g. inununoasaays) „ ^ 

The present invention providea poJynuCeotid. seances and proteins encoded thereby aa „e„ • 
pm^s denved from the po.ynudeotide sequences, antibodies directed to me encode. :£Z «' 
pred,c„ve, preventive, diagnostic, prognostic and therapeutic uses for individual whi* are « I 
for or wfcch have malignant neoplaaia and breas, cancer in particular Th. 

The present invention is baaed on the identification of ft, g e„. s ^ m w 

(up- or down regu,ated) „ mmor biopsiea of ^ with ctinica, evidence of breas, ITef T, 

breast cancer. The gene names, the database accession numbera (Genename Reference 

of me encoded pretema and their auhccHuiar .ocahzation are given in Tab.es , to 4a J^Z 
pnmerscuencea usedforme gene amp.ification and hybridation probes are ahotvn in T ah,e 5 
The present invention relates to: 

lec,,:::!^ CharaC,eri2in8 ^ " *"» "» - - ™c disease in 
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determining the pattern of expression levels of at least 6, 8, 10, 15, 20 30 or 4 
marker genes, comprised in a group of marker genes consisting ofSEQ ED NO 1 1 
165 and 472 to 491, in a biological sample from said subject, 

comparing the pattern of expression levels determined in (i) with one or seven 
reference pattern(s) of expression levels, 

<W characterizing the state of said neo pl a S „c diaeaae in aaidsubjec, fiom the outeom 
of the comparison in step (ii). 
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2. 



A method for detection, diagnosis, screening, monitoring, and/or prognosis of a neoplastic 
disease m a subject, (preferably ex vivo) comprising 

determining the pattern of expression levels of at least 1, 2,' 3, 5, 10, 15, 20, 30 or 
47 marker genes, comprised in a group of marker genes consisting of SEQ ID 
NOs:l to 17, 19 to 33, 35 to 50, 52 to 64, 66 to 85, 88 to 91, and 93 to 165 and 472 
to 491 in biological samples from said subject, 



(0 



(ii) comparing the pattern of expression levels determined in (i) with one or several 
reference pattern(s) of expression levels, 

(iii) detecting, diagnosing, screening, monitoring, and/or proposing said neoplastic 
disease in said subject from the outcome of the comparison in step (ii). 

Determination of an expression level can comprise a quantification of the expression level 
and/or a purely qualitative determination of the expression level. 

A "pattern of expression levels" of a single gene is to be understood as the expression level of said 
gene as determined by suitable methods. 

Nucleic acid molecules, referred to with a specific SEQ ID NO, within the meaning of the 
mvennon, are to be understood as comprising also variants of said nucleic acid molecules which 
can be derived from the original nucleic acid molecules by deletion, insertion or transposition of 
nucleotides, provided said variants still have an 80, 90, 95, or 99% sequence identity towards the 
ongmal sequence. Preferrably the variants still have the same biological activity and/or function as 
20 have the original molecules. 

It is obvious to the person skilled in the art <ha< , reference ,o a nucleotide sequence is n«u, t ,„ 
compns. Ore reference to the associated pro , ei „ se q „e n0 e which is coded by said nucleotide 
sequence. 



»% Kientay of a first sequence towards a second sequence, within the meaning of the invention 
means the % identity which is calculate<J M follows: ^ ^ ^.^ ^ 

the two sequences is determined with the CLUSTALW algorithm [Thomson JD Higgins DG 
Gsbson TJ. 1994. ClustalW: Improving the sensitivity of progressive multiple sequence alignmeni 
through sequence weighting, positions-specific gap penalties and weight matrix choice Nucleic 
Acids Res., 22: 4673-4680], Version 1.8, applying the following command line syntax: ./clustalw 
30 ,nfile=.Anfi,e.txt -output= -outorder=aligned -pwmatrix=gonnet - P wdnamatrix=clustalw 
-pwgapopen=10.0 -pwgapext=0. 1 -matrix=gonnet -gapopen=10.0 -gapext=0.05 -gapdist=8 
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-hgapresidues=GPSNDQERK -maxdiv=40. Implementations of the CLUSTAL W algorithm ar 
readily available at numerous sites on the internet, including, e.g., http://www.ebi.ac.ul 
Thereafter, the number of matches in the aligrmient is determined by counting the number c 
identical nucleotides (or amino acid residues) in aligned positions. Finally, the total number c 
matches is divided by the number of nucleotides (or amino acid residues) of the longer of the tw 
sequences, and multiplied by 100 to yield the % identity of the first sequence towards the secon 
sequence. t 

3. A method of count 1 or 2, wherein said method comprises multiple determinations of 
pattern of expression levels, at different points in time, thereby allowing to monitor th 
• development of said neoplastic disease in said subject. 



4. 



A method of count 1, wherein said method comprises an estimation of the likelihood o 
success of a given mode of treatment for said neoplastic disease in said subject. 

5. A method of count 1, wherein said method comprises an assessment of whether the subjec 
is expected to respond or whether the subject is expected not to a given mode of treatmen 
for said neoplastic disease. 

The terms "to respond" or "not to respond" are to be understood in a qualitative and/or in , 
quantitative fashion. "To respond" and"not to respond" is to be assessed with regard to a suitabl 
reference responses, such as, e.g., responses shown by "responders" and "not-responders" to ■ 
certain mode of treatment or modality of treatment. 

6. A method of count 4 or 5, wherein a. predictive algorithm is used.- 

Predictive algorithms, which are well known to a person skilled in the art of data analysis, are to b- 
understood as being any kind of predictive algorithm known in the art. Preferred examples of sucl 
algorithms are, e.g., the SVM algorithm disclosed in Example 4. 

7. A method of count 6, wherein the predictive algorithm is a Support Vector Machine. 

Support Vector Machines are algorithms, well known to the person skilled in the art of dat 
analysis. A Support Vector Machine algorithm is disclosed in Example 4. 

8. A method of any of counts 4 to 7, wherein said given mode of treatment 
(0 acts on cell proliferation, and/or 

(ii) acts on cell survival, and/or 
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(iii) acts on cell motility, and/or 

i 

(iv) is an anthracycline based mode of treatment, and/or 

(v) comprises administration of epirubicin and/or cyclophoshamid. 

9. A method of treatment for a subject afflicted with a neoplastic disease, comprising 

(i) identifying a promising mode of treatment with the method of count 4 or 5, 

(ii) treating said neoplasticdisease in said patient by the mode of treatment identified 
in step (i). 



10. 



11. 



A method of screening for subjects afflicted with a neoplastic disease, wherein the method 
of count 1 or 2 is applied to a plurality of subjects. 

A method of screening for substances and/or therapy modalities having curative effect on a 
neoplastic disease comprising 



(0 
00 



obtaining a biological sample from a subject afflicted with said neoplastic disease, 

assessing, from said biological sample, using the method of count 4 or 5, whether 
said subject is expected to respond to a given mode of treatment for said neoplastic 
disease, 



(iii) if said subject is expected to respond to said given mode of treatment, incubating 
said biological sample with said substance under said therapy modalities, 

(iv) observing changes in said biological sample triggered by said test substance under 
said therapy modalities, 



on 



(v) selecting or rejecting said test substance and/or said therapy modalities, based 
the observation of changes in said biological sample under (iv). 

Selecting specific biological samples of, e.g., good responds to a given threapy can help to 
idenhfy novel substances and/or therapy modalities for the treatment of said specific neoplastic 
disease. 

12. A method of screening for compounds having curative effect on a neoplastic disease 
comprising 

(i) incubating biological samples or extracts of these with a test substance, 
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13. 



to 491 in said biological sample, 

<W comparing ». pa^ of exprcssion leveh jn 
reference pattern(s), 

(iv) ^S," rejecfas ,est ^ — - ^ — * — . 

A method of any of counts 1 to ,2 w herein said ^ ^ m 
marker genes listed in Table 2. yraeo in a group . 

14. A -^ofan y of TO un teltol 3, wheretatte ^ jonleveIbdeteni]ioed 
15 (i) with a hybridization based method, or 

(") * ^ahybridizationbasedmethodutilizingarrayedprobes.or 

(iii) ^^ybridizanonbasedmemodumiringmdividually 

(iv) by real time real time PCR, or 

W by assessing the expression of polypeptides, proteins or derivatives thereof, or 

(vi) by assessing the amount of polypeptides, protetas or derivatives thereof. 

A method of any of counts , „ 14, wherein ,he neoplastic disease is breast cancer. 

Tne methods of me invention a* preferably performed « v,Vo. More preferably method, of * 
mventmn are performed as Wvo on samples Iha, are already available I can b . m 
mtervennonofaphysicianoromermedically.ra^personnel. ° U 

25 " ^ C ~ 8atl ^ t6 - 8 ' 10 - ,5 - 20 - 30 - Or47 '> ri '"-P^^P™bessui b b,e,„ 
marker genes comprised in a group of marker genes consisting of 

(0 SEQ ID NO: 1 to SEQ ID NO: 1 65, or 



IS. 
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(ii) the marker genes listed in Table 2. 



17. 



18. 



A kit comprising at least 6, 8, 10/ 15, 20, 30, or 47 sets of individually labeled probes, each 
having a sequence comprised in a group of sequences consisting of SEQ ID NO-331 to 
SEQIDNO:471. 

A kit comprising at least 6, 8, 10, 15, 20, 30, or 47 sets of arrayed probes, each having a 
sequence comprised in a group of sequences consisting of SEQ ID NO:331 to SEQ ID 
NO:471. 

Biological relevance of the venes whinh n„ Dart ofth* i nvPn t in » 

Some of the genes listed in Table la and lb represent biological, cellular processes and are 
characterized by similar regulation of genes. By the way of illustration but limited to the following 
examples a few characteristic genes from Tablel are described in later by greater detail: 

MAD2L1 

The initiation of chromosome segregation at anaphase is linked by the spindle assembly 
checkpoint to the completion of chromosome-microtubule attachment during metaphase To 
determine the function of the Mad2 protein during normal cell division, knock out experiments in 
nuce were performed. These cells were unable to arrest in response to spindle disruption At 
embryonic day 6.5, the cells of the epiblast began rapid cell division, and the absence of a 
checkpoint resulted in widespread chromosome missegregation and apoptosis. In contrast, the 
postmitotic trophoblast giant cells survived without Mad2. Thus, the spindle assembly checkpoint 
is required for accurate chromosome segregation in mitotic mouse cells and for embryonic 
viability, even in the absence of spindle damage. 

Meiosis I nondisjunction in spindle checkpoint mutants could be prevented by delaying the onset 
of anaphase. In a recombinant-defective mutant, the checkpoint delayed the biochemical events of 
anaphase I, suggesting that chromosomes that are attached to microtubules but are not under 
tension can activate the spindle checkpoint. Spindle checkpoint mutants reduced the accuracy of 
chromosome segregation in meiosis I much more than that in meiosis H, suggesting that checkpoint 
defects may contribute to Down syndrome and possibly to the "chaotic" polyploidy observed in 
cancer. 

IGFBP4 . 

Seven structurally distinct insulin-like growth factor binding proteins have been isolated and their 
cDNAs cloned: IGFBP1, IGFBP2, IGFBP3, IGFBP4, IGFBP5, IGFBP6, and IGFBP7. The 



proteins display strong sequence homologies, suggesting that they are encoded by a closely relatec 
family of genes. The IGFBPs contain 3 structurally distinct domains each comprise 
approximately one-third of the molecule. The N-terminal domain 1 and the C-terminal domain 3 o 
the 6 human IGFBPs show moderate to high levels of sequence identity including 12 and ( 
invariant cysteine residues in domains 1 and 3, respectively (IGFBP6 contains 10 cysteine residue 
in domain I), and are thought to be the IGF binding domains. Domain 2 is defined primarily by S 
lack of sequence identity among the 6 IGFBPs and by a lack of cysteine residues, though it doe< 
contam 2 cysteines in IGFBP4. Domain 3 is homologous to the thyroglobulin type I repeat unit 
Stupes suggested that the primary effect of the proteins is the attenuation of IGF activity anc 
suggested that they contribute to the control of IGF-mediated cell growth and metabolism 



DDB2 



In human cells, efficient global genomic repair of DNA damage induced by ultraviolet radiatior 
requires the p53 tumor suppressor. The p48 gene is required for expression of an ultraviole 
radiation-damaged DNA-binding activity and is disrupted by mutations in the subset of xerodernu 
pigmentosum group E cells that lack this activity, DDB-negative XPE. p48 mRNA levels ar< 
strongly depend on basal P 53 expression and increase further after DNA damage in a p53 
dependent manner. Furthermore, like p53 -/- cells, xeroderma pigmentosum group E cells a« 
deficxent in global genomic repair. These results identified p48 as a link between P 53 and th< 
nucleotide excision-repair apparatus. 

UV-damaged DNA-binding activity (UV-DDB) is deficient in cell lines and primary tissues fron 
rodents. Transfection of p48 conferred UV-DDB to hamster cells and. enhanced removal o 5 
cyclobutane pyrimidine dimers (CPDs) from genomic DNA and from the nontranscribed strand o, 
an expressed gene. Expression of p48 suppressed UV-induced mutations arising from th< 
nontranscribed strand but had no effect on cellular UV sensitivity. The results defined the role o 
p48 in DNA repair, demonstrated the importance of CPDs in mutagenesis, and suggested ho* 
rodent models can be improved to better reflect cancer susceptibility in humans. 

HSPA2 

Several heat-shock protein genes are located in the major histocompatibility complex ox 
chromosome 6, e.g., HSPA1 . However HSPA2 is located on 14q22-q24 . isolated The clone fo: 
HSPA2 is characterized by a single open reading frame of 1,917 basepairs that encodes a 639 
amino acid protein with a predicted molecular weight of 70,030 Da. Analysis of the sequenc, 
indicated that HSPA2 is the human homolog of the murine Hsp70-2 gene with 91.7% identity i, 
the nucleotide coding sequence and 98.2% in the corresponding amino acid sequence. HSPA2 ha. 
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less amino acid homology to the other members of the human HSP 7 0 gene family. HSPA2 is 
constitutive* expressed in most tissues, with very high levels in testis and skeletal muscle HSPA2 
. expressed abundantly in muscle, *eart, esophagus, and brain, and to a lesser extent in testis A 
female homozygous knockout mice for Hs P 70-2 undergo normal meiosis and is fertile. In contrast 
homozygous male knockout mice lacked postmeiotic spermatids and mature sperm and were- 
mferhle. Hsp70-2 is normally associated with synaptonemal complexes in the nuclei of meiotic 
spermatocyte, In the male knockouts, these structures were abnormal by late prophase. One can 
observe also a large increase in spermatocyte apoptosis. 

Polynucleotides 

A .3REAST CANCER GENE" polynucleotide can be single- or double-stranded and comprises a 
codmg sequence or the complement of a coding sequence for a ..BREAST CANCER GENE" 
polypeptide. Degenerate nucleotide sequences encoding human ..BREAST CANCER GENE" 

polypeptides, as well as homologous nucleotide sequences which are at least about 50 55 60 65 
70, preferably about 75, 90, 96, or 98% identical to the nucleotide sequences of SEQ m NO- 1 to 
165 and 472 to 491 also are JBKEAST CANCER GENE" polynucleotide, Percent sequence 
Klenuty between the sequence, of two polynucleotides is determined using computer programs 
such as ALIGN which employ the PASTA algorithm, using an affine gap search with a gap open 
penalty of -12 and a gap extension penalty of -2. Complementary DNA (cDNA) molecules, species 
homologues, and variants of , 3 REAST CANCER GENE" polynucleotides which encode 
bxologically active ..BREAST CANCER GENE" polypeptides also are , 3 REAST CANCER 
GENE" polynucleotides. 

Preparatio n of Polynucleotides 

A naturally occurring ..BREAST CANCER GENE" polynucleotide can be isolated free of other 
cellular components such as membrane component,, proteins, and lipid, Polynucleotides can be 
made by a cell and isolated using standard nucleic acid purification techniques, or synthesized 
usmg an amplification technique, such as the polymerase chain reaction (PGR), or by using an 
automatic synthesizer. Methods for isolating polynucleotides are routine and are known in the art 
Any such technique for obtaining a polynucleotide can be used to obtain isolated .BREAST 
CANCER GENE" polynucleotides. For example, restriction enzymes and probes can be used to 
isolate polynucleotide fragments which comprises „BREAST CANCER GENE" nucleotide 
sequences. Isolated polynucleotides are in preparations which are free or at least 70, 80, or 90% 
free of other molecules. 
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„BREAST CANCER GENE" cDNA m«i»„. i 

— « ^ breast z^^zr^?* •* s ^ mo,ecu,ar biote 

> (7), both of which are incorporated her*™ k*, 

An amplification technique, such as PCR. «m * ^ * ( 

n » .r^K, can be used to obtain <>aa-4~; i 

polynucleotides of the invention udn.**, u addiOona, cop.es 

me mvenoon, usmg etther human genomic DNA or cDNA a* a template. 

£W* synthetic chemise technim.es cnn be used to synthesizes , 3MAST CANCE 

-^:ssssr; — = 

active variant thereof. AbT CANCER GENE polypeptide or a biological 

Identification of differential exp ression 

IWnpa ^ coUected ^ ^ whjoh , 

expressed genes may be identified hy uHfeing . variely of methods w J ch JT*°*~* 

in me art For example, differentia, screening (Tedder, X. P " M f^Thl 
hybridization [Hedrict s. M. e, al„ W84 . <o ): Ue, S. W. « a, ,9^2 d 
different display (Xiang. P., . ^ „ b ., , 993> J^Z^^. 
.incorporated herein by reference in it, „ « ^ 5,262,311, wh.cn , 

^-vedfiJ^rj^yeZ^^ 10 — 

- — . .p,a«e r ~ srrrrrrr s 

corresp.nd.ng tothemRNA popu,a„„„ of a seemtd oe„ type. For ^ ^J^* 

corre^ond to a tota, ceH cDNA prohe of a cell type deHved fiom a Lro, suZ Th e T 
second cDNA prohe may correspond ,o a tdta, cell cDNA prohe of me same ceU ^ 
an extents, subject Those Cones which hybrid to one prohe but „„ t „ ££££ 
represent clones deHved fiom genes differently exposed in me eel, type of m^^T 
versus experimental subjects. COntro 
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Subtractive hybridization techniques generally involve the isolation of mRNA taken from two 
different sources, e.g., control and experimental tissue, the hybridization of the mRNA or single- 
stranded cDNA reverse-transcribed from the isolated mRNA, and the removal of all hybridized, 
and therefore double-stranded, sequences. The remaining non-hybridized, single-stranded cDNAs,' 
potentially represent clones derived from genes that are differentially expressed in the two mRNA 
sources. Such single-stranded cDNAs are then used as the starting material for the construction of 
a library comprising clones derived from differentially expressed genes. 

The differential display technique describes a procedure, utilizing the well known polymerase 
chain reaction (PCR; the experimental embodiment set forth in Mullis, K. B., 1987, U.S. Pat. No. 
4,683,202) which allows for the identification of sequences derived from genes which are 
differentially expressed. First, isolated RNA is reverse-transcribed into single-stranded cDNA 
utilizing standard techniques which are well known to those of skill in the art. Primers for the 
reverse transcriptase reaction may include, but are not limited to, oligo dT-containing primers, 
preferably of the reverse primer type of oligonucleotide described below. Next, this technique uses 
pairs of PCR primers, as described below, which allow for the amplification of clones representing 
a random subset of the RNA transcripts present within any given cell. Utilizing different pairs of 
primers allows each of the mRNA transcripts present in a cell to be amplified. Among such 
amplified transcripts may be identified those which have been produced from differentially 
expressed genes. 



The reverse oligonucleotide primer of the primer pairs may contain an oligo dT stretch of 
nucleotides, preferably- eleven nucleotides long, at its 5' end, which hybridizes to the poly(A) tail 
of mRNA or to the complement of a cDNA reverse transcribed from an mRNA poly(A) tail. 
Second, in order to increase the specificity of the reverse primer, the primer may contain one or 
more, preferably two, additional nucleotides at its 3' end. Because, statistically, only a subset of the 
25 mRNA derived sequences present in the sample of interest will hybridize to such primers, the 
additional nucleotides allow the primers to amplify only a subset of the mRNA derived sequences 
present in the sample of interest This is preferred in that it allows more accurate and complete 
visualization and characterization of each of the bands representing amplified sequences. 

The forward primer may contain a nucleotide sequence expected, statistically, to have the ability to 
30 hybridize to cDNA sequences derived from the tissues of interest. The nucleotide sequence may be 
an arbitrary one, and the length of the forward oligonucleotide primer may range from about 9 to 
about 13 nucleotides, with about 10 nucleotides being preferred. Arbitrary primer sequences cause 
the lengths of the amplified partial cDNAs produced to be variable, thus allowing different clones 
to be separated by using standard denaturing sequencing gel electrophoresis. PCR reaction 
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colons shoutd be chosen „ Wch optimi2e ^ 

*7 ^ ^ -** ° f ** -y be resotved utnfcng ^ 

etectrophorests ,e.hni q ues. Such « fa condWons m we „ ^ „ ^ « 

■mportan reason parameters nrCude, for examp.e, length and nuCeotide science 
ohgonuc eo.de pnmers as discussed above, and annexing and Congation step tempelres a 
reacon nmes^The padem of Cones resuWng finm the reverse h^scn^ and 

IZ™" diffe T oen is disp,ayed wa — - -=~: 

compared. Differences - the two banding pane™ mdicate potentia% diff 



.0 When scrcennrg for fbtUength cDNAs, i, is preferabie to use tibraries tha, have been size-setec. 
to tnCutte iarger cDMAs. Rsndondv-prt.ned Hbrartes are preface, in dr.. th.y ^ n cootain ^ 
scnences whtch conrain the P regie, of ge^s. Us. „, . ^ J 

MP Prefembie for situaHons in which an otigo d(T) Hbrary does no, yiCd JLTeng 

CdmmerCaUy avaflabte .a piuM y Cecho.hor.sis sysfcms can be used ,o anatyze thesize , 
confirm dre n„c,eo«d. s.r.uence of PCR or ae q nencing produCs. For exam.,., capil 

dves (one for eac nuCeodde) which are User activated, and detection o, Ure enrided wavCe^ 
20 by a charge conpled device camem. Ocputffight in,ensi V can be converted ,o Cectrica, si^ 
usmg approbate softwa* (e.g. GENOTYPER and Science NAVIGATOR, Parkin Btaer aTi 
and me enure process from toading of sam P ,es ,o computer anaiysia and electronic dau displa 

smaH preces of DNA which might he present in limi «ed amount in a particular samp... 
25 Once potentially differently ^preaaed gene sciences have been identified ^ buik 

uch as or exantpte, those describe, above, ft. difibrentia, expression of such putativel 
dtfferenhally expressed genes shomd be corroborate Corrobomtion may be accotnpHahed via f< 

LTff T,r ta,own ***** k Northem -** rt - pc ^ <*« -~ 0 , 

30 1 V '™ 8MKS ■* * ^ - "» * •*-»- « Urg, 

30 and/or marker genes, as discussed, below. g 

Afferent., d.ap.ay may be useo to isolate m .ength Cones of Ore corresponding gene. The L 
.ength codtng portion of the gene may readuy b. isolated, wimou, ..due expenmen M „„ b - 
motecular biotogical tecta,^ „a„ know, in the art For examp.e. the isolatKl 
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expressed amplified fragment may be labeled and used to screen a cDNA library. Alternatively, the 
labeled fragment may be used to screen a genomic library. 

An analysis of the tissue distribution of the mRNA produced by the identified genes may be 
conducted, utilizing standard techniques well known to those of skill in the art. Such techniques 
may include, for example, Northern analyses and RT-PCR. Such analyses provide information as 
to whether the identified genes are expressed in tissues expected to contribute to breast cancer. 
Such analyses may also provide quantitative information regarding steady state mRNA regulation, 
yielding data concerning which of the identified genes exhibits a high level of regulation in, 
preferably, tissues which may be expected to contribute to breast cancer. 



Such analyses may also be performed on an isolated cell population of a particular cell type 
derived from a given tissue. Additionally, standard in situ hybridization techniques may be utilized 
to provide information regarding which cells within a given tissue express the identified gene. 
Such analyses may provide information regarding the biological function of an identified gene 
relative to breast cancer in instances wherein only a subset of the cells within the tissue is thought 
15 to be relevant to breast cancer. 

Extending Polynucleotides 

In one embodiment of such a procedure for the identification and cloning of full length gene 
sequences, RNA may be isolated, following standard procedures, from an appropriate tissue or 
cellular source. A reverse transcription reaction may . then be performed on the RNA using an 

20 oligonucleotide primer complimentary to the mRNA that corresponds to the amplified fragment, 
for the priming of first strand synthesis. Because the primer is anti-parallel to the mRNA, 
extension will proceed toward the 5* end of the mRNA. The resulting RNA hybrid may then be 
"tailed" with guanines using a standard terminal transferase reaction, the hybrid may be digested 
with RNase H, and second strand synthesis may then be primed with a poly-C primer. Using the 

25 two primers, the 5' portion of the gene is amplified using PCR. Sequences obtained may then be 
isolated and recombined with previously isolated sequences to generate a full-length cDNA of the 
differentially expressed genes of the invention. For a review of cloning strategies and recombinant 
DNA techniques, see e.g., Sambrook et al, (6); and Ausubel et al., (7). 

Various PCR-based methods can be used to extend the polynucleotide sequences disclosed herein 
30 to detect upstream sequences such as promoters and regulatory elements. For example, restriction 
site PCR uses universal primers to retrieve unknown sequence adjacent to a known locus [Sarkar, 
1993, (1 1)]. Genomic DNA is first amplified in the presence of a primer to a linker sequence and a 
primer specific to the known region. The amplified sequences are then subjected to a second round 
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of PGR with fte same linker primer and aether specific primer internal «o the first one. Prcduc, 
of each round of PCR are inscribed with an appropriate RNA po^rase and sequenced usin 
reverse transcriptase. 

Inverse PCR also can be used to amolifv or evt^nH 
- . . _ PUly ° r CXtend se q^nces using divergent primers based on 

5 .own regmn tTriglia e, a,, !98 8 , (I2) , ^ can ^ ^ _ 

software, such as OLIGO 4.06 Primer Analysis software (Nauona, Biosciences inc., Plymoun 
Mnn,), to be e.g. 2230 nucleotides in ,engft, to have a GC content of 50% „ rm ore. and to anne, 
to the urge, sequence at temperatures about 68-72-C. The meted uses severai restriction enzyme 

10 intramolecular ligation and used as a PCR template. 

Another method which can be used is capture PCR. which involves PCR amplification of DN, 
fragments adjacent ,0 a known sequence in human and yeas, arfificia. chromosome DN/ 
[Lagersteon, et a.., ,99,, (l 3»,. fc m ^ raultiple ^ ^ ^ 

hgahons afco can he used to place an engineered double^r^ sequence into an untaow, 
15 fragment of the DNA molecule before performing PCR. 

AddMonafiy, PCR, nested primers, and PROMOTERHNDER iibraries (CLONTECH; P.,o Alto 
Cahf.) can be used to waft genomic DNA (CLONTECH, Pa,o Aito, CaUf.). This proceas avoid 
the need to screen libraries and is useful in finding inteoh/exon junctions. 

The sequences of the identified genes may be used, utilizing staooaM technique*, to place m 
g»es onto genefic maps, e.g., mouse [Copeland * Jenkins, 1991, (14)] and human genetic map- 
[Cohen. e,al.. ,993 ,(,5)„Such mapping info™,™ may yie.d infonnauon regarding the gene'' 
unportence to human disease by, for examp,e, identifying genes which-map near generic regions u 
which known genetic breast cancer tendencies map. 

IS Variants and homology* of ft, breast cancer oene „ 

CANCER GENE polynucleotide sequences can be identified by hybridization of Candida., 
polynudeorides to known J3REAST CANCER GENE" polynucleotides under suingen 
conditions, as is known i„ fte art. For example, using the following wash conditions: 2X SSC (0 ■■ 
10 M Nad, 0.03 M aodium citrate, P H 7.0), 0.1% SDS, room tempera*,, twice, 30 minutes each' 
•hen 2X SSC, 0.1% SDS, 50 EC once, 30 minutes; then 2X SSC, room temperahire rivice K 
rn.nu.es each homologous sequences can be identified which contain at most about 25-30* 
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basepair mismatches. More preferably, homologous polynucleotide strands contain 15-2 5 o/ 0 
basepajr mismatches, even more preferably 5-15% basepair mismatches. 

Species homologues of the .3REAST CANCER GENE" polynucleotides disclosed herein also can 
be xdentified by making suitable probes or primers and screening cDNA expression libraries from 
other spec.es, such as mice, monkeys, or yeast. Human variants of BREAST CANCER GENE" 
polynucleotides can be identified, for example, by screening human cDNA expression libraries It 
is well known that the T m of a double-stranded DNA decreases by 1-1.5°C with every 1% decrease 
m homology [Bonner et al., 1973, (16)]. Variants of human ..BREAST CANCER GENE" 
polynucleotides or ..BREAST CANCER GENE" polynucleotides of other species can therefore be 
,dent.fied by hybridizing a putative homologous .3REAST CANCER GENE" polynucleotide with 
a polynucleotide having a nucleotide sequence of one of the sequences of the SEQ ID NO- 1 to 165 
and 472 to 491 or the complement thereof to form a test hybrid. The melting temperature of the 
test hybrid is compared with the melting temperature of a hybrid comprising polynucleotides 
havmg perfectly complementary nucleotide sequences, and the number or percent of basepair 
mismatches within the test hybrid is calculated. 

Nucleotide sequences which hybridize to .3REAST CANCER GENE" polynucleotides or their 
complements following stringent hybridization and/or wash conditions also are JBREAST 
CANCER GENE" polynucleotides. Stringent wash conditions are well known and understood in 
the art and are disclosed, for example, in Sambrook et al., (6). Typically, for stringent 
hybruhzation conditions a combination of temperature and salt concentration should be chosen 
that is approximately 12to20°C below the calculated T ra of the hybrid under study. The T of a • 
hybrid between a JBREAST CANCER GENE" polynucleotide having a nucleotide sequ Jce of 
one of the sequences of the SEQ ID NO: 1 to 165 and 472 to 491 or the complement thereof and a 
polynucleotide sequence which is at least about 50, preferably about 75, 90, 96, or 98% identical 
to one of those nucleotide sequences can be calculated, for example, using the equan0 n below 
[Bolton and McCarthy, 1962, (17): 

T m = 81.5°C - 16.6aog.orNa*]) + 0.41(%G + Q - 0.63(%formamide) - 600/1), 
where 1 = the length of the hybrid in basepairs. 

Stringent wash conditions include, for example, 4X SSC at 65°C, or 50% formamide 4X SSC at 
28°C, or 0.5XSSC, 0.1% SDS at 65'C. Highly stringent wash conditions include, for example 
0.2X SSC at 65°C. ' 

The biological function of the identified genes may be more directly assessed by utilizing relevant 
m vivo and in vitro systems. In vivo systems may include, but are not limited to, animal systems 
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which naturally exhibit breast cancer predisposition, or ones which have been engineered I 
exhibit such symptoms, including but not limited to oncogene overexpression (e.g. HER2/neu, ra 
raf, or EGFR) malignant neoplasia mouse. 

Splice variants derived from the same genomic region, encoded by the same pre mRNA can t 
identified by hybridization conditions described above for homology search. The specifi 
characteristics of variant proteins encoded by splice variants of the same pre transcript may diffi 
and can also be assayed as disclosed. A .3REAST CANCER GENE" polynucleotide having 
nucleotide sequence of one of the sequences of the SEQ ID NO: 1 to 165 and 472 to 491 or th 
complement thereof may therefor differ in parts of the entire sequence. The prediction of splicin 
events and the identification of the utilized acceptor and donor sites within the pre mRNA can b 
computed (e.g. Software Package GRAIL or GenomeSCAN) and verified by PCR method by thos 
with skill in the art. 

r 

Antisense oligonucleotides 

Antisense oligonucleotides are nucleotide sequences which are complementary to a specific DN, 
or RNA sequence. Once introduced into a cell, the complementary nucleotides combine wit 
• natural sequences produced by the cell to form complexes and block either transcription c 
translation. Preferably, an antisense oligonucleotide is at least 6 nucleotides in length, but fean be i 
least 7, 8, 10, 12, 15, 20, 25, 30, 35, 40, 45, or 50 or more nucleotides long. Longer sequences als 
can be used. Antisense oligonucleotide molecules can be provided in a DNA construct an 
20 introduced into a cell as described above to alter the level of .3REAST CANCER GENE" gen 
products in the cell. 

Antisense oligonucleotides can be deoxyribonucleotides, ribonucleotides, peptide nucleic acid 
(PNAs; described in U.S. Pat. No. 5,714,331), locked nucleic acids (LNAs; described in W( 
99/12826), or a combination of them. Oligonucleotides can be synthesized manually or by a 
automated synthesizer, by covalently linking the 5' end of one nucleotide with the 3' end of anothc 
nucleotide with non-phosphodiester internucleotide linkages such alkylphosphonate, 
phosphorothioates, phosphorodithioates, alkylphosphonothioates, alkylphosphonates, phospho. 
amidates, phosphate esters, carbamates, acetamidate, carboxymethyl esters, carbonates, an 
phosphate triesters[Brown, 1994, (55); Sonveaux, 1994, (56) and Uhlmann et al., 1990, (57)].' 

Modifications of .3REAST CANCER GENE" expression can be obtained by designing antisens 
oligonucleotides which will form duplexes to the control, 5', or regulatory regions of th 
.3REAST CANCER GENE". Oligonucleotides derived from the transcription initiation site, e.g 
between positions 10 and +10 from the start site, are preferred. Similarly, inhibition can b 
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achieved using "triple helix" base-pairing methodology. Triple helix pairing is useful because it 
causes inhibition of the ability of the double helix to open sufficiently for the binding of 
polymerases, transcription factors, or chaperons. Therapeutic advances using triplex DNA have 
been descnbed in the literature [Gee et al., 1994, (58)]. An antisense oligonucleotide also can be 
designed to block translation of mRNA by preventing the transcript from binding to ribosomes. 

Precise complementarity is not required for successful complex formation between an antisense 
oligonucleotide and the complementary sequence of a ..BREAST CANCER GENE" poly- 
nucleotide. Antisense oligonucleotides which comprise, for example. 2, 3, 4, or 5 or more stretches 
of contiguous nucleotides which are precisely complementary to a ..BREAST CANCER GENE" 
polynucleotide, each separated by a stretch of contiguous nucleotides which are not 
complementary to adjacent J3REAST CANCER GENE" nucleotides, can provide sufficient 
targeting specificity for .3REAST CANCER GENE" mRNA. Preferably, each stretch of 
complementary contiguous nucleotides is at least 4. 5, 6, 7, or 8 or more nucleotides in length 
Non-complementary intervening sequences are preferably 1, 2, 3. or 4 nucleotides in length One 
skilled in the art can easily use the calculated melting point of an antisense r sense pair to determine 
the degree of mismatching which will be tolerated between a particular antisense oligonucleotide 
and a particular .3REAST CANCER GENE" polynucleotide sequence. 

Antisense oligonucleotides can be modified without affecting their ability to hybridize to a 
..BREAST CANCER GENE" polynucleotide. These modifications can be internal or at one or 
both ends of the antisense molecule. For example, internucleoside phosphate linkages can be 
modified by adding cholesteryl or diamine moieties with varying numbers of carbon residues 
between the amino groups and terminal ribose. Modified bases and/or sugars, such as arabinose 
instead of nbose, or a 3;. 5' substituted oligonucleotide in which the 3' hydroxyl group or the 5' 
phosphate group are substituted, also can be employed in a modified antisense oligonucleotide 
These modified oligonucleotides can be prepared by methods well known in the art[ Agrawal et 
al.. 1992, (59); Uhlmann et al., 1987, (57) and Uhlmann et al., 2000 (60)]. 

Ribozvmes 

Ribozymes are RNA molecules with catalytic activity [Cech, 1987, (61); Cech. 1990 (62) and 
Couture & Stinchcomb, 1996, (63)]. Ribozymes can be used to inhibit gene function by cleaving 
an RNA sequence, as is known in the art (e.g., Haselbff et al., U.S. Patent 5,641,673). The 
mechanism of ribozyme action involves sequence-specific hybridization of the ribozyme molecule 
to complementary target RNA, followed by endonucleolytic cleavage. Examples include 
engineered hammerhead motif ribozyme molecules that can specifically and efficiently catalyze 
endonucleolytic cleavage of specific nucleotide sequences. 
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The transcribed sequence of a JBREAST CANCER GENE" can be used to generate ribozymes 
which will specifically bind to mRNA transcribed from a JBREAST CANCER GENE" genomic 
locus. Methods of designing and constructing ribozymes which can cleave otherRNA molecules in 
trans in a highly sequence specific manner have been developed and described in the art [Haseloff 
et al., 1988, (64)]. For example, the cleavage activity of ribozymes can be targeted to specific 
RNAs by engineering a discrete "hybridization" region into the ribozyme. The hybridization region 
contains a sequence complementary to the target RNA and thus specifically hybridizes with the 
target [see, for example, Gerlach et al., EP 0 321201]. 

Specific ribozyme cleavage sites within a .3REAST CANCER GENE" RNA target can be 
identified by scanning the target molecule for ribozyme cleavage sites which include the following 
sequences: QUA, GUU, and GUC. Once identified,, short RNA sequences of between 15 and 20 
ribonucleotides corresponding to the region of the target RNA containing the cleavage site can be 
evaluated for secondary structural features which may render the target inoperable. Suitability of 
candidate .3REAST CANCER GENE" RNA targets also can be evaluated by testing accessibility 
to hybndization with complementary oligonucleotides using ribonuclease protection assays. 
Longer complementary sequences can be used to increase the affinity, of the hybridization 
sequence for the target. The hybridizing and cleavage regions of the ribozyme can be integrally 
related such that upon hybridizing to the target RNA through the complementary regions, the 
catalytic region of the ribozyme can cleave the target. 

Ribozymes can be introduced into cells as part of a DNA construct. Mechanical methods, such as 
microinjection, liposome-mediated transfection, electroporation, or calcium phosphate precipita- 
tion, can be used to introduce a ribozyme-containing DNA construct into cells in which it is 
desired to decrease .3REAST CANCER GENE" expression. Alternatively, if it is desired that the 
cells stably retain the DNA construct, the construct can be supplied on a plasmid and maintained 
as a separate element or integrated into the genome of the cells, as is known in the art. A ribozyme- 
encoding DNA construct can include transcriptional regulatory elements, such as a promoter 
element, an enhancer or UAS element, and a transcriptional terminator signal, for controlling 
transcription of ribozymes in the cells. 

As taught in Haseloff et al., U.S Pat. No. 5,641,673, ribozymes can be engineered so that ribozyme 
expression will occur in response to factors-which induce expression of a target gene. Ribozymes 
also can be engineered to provide an additional level of regulation, so that destruction of mRNA 
occurs only when both a ribozyme and a target gene are induced in the cells. 
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Polypeptides 



BREAST CANCER GENE" polypeptides according „ *. invention comprise „ polypeptWe 
selected from SEQ !D NO: ,66 to 330 and 492 ,o 5.1 „ r encoded by aoy of ^ 
sequences o ,he SEQ ID NO: , Io ,65 and 472 to 4 9, or derivatives , ^ * 

homologuea hereof. A BREAST CANCER GENE" polypeptide of ,he invention ,1^*, b e 
a pomon, a fiai-fcmgm. or a foaion protein comprising a„ or a portion of a "BREAST CANCER 
GENE polypeptide. 



Protein Purification 



.3REAST CANCER GENE" polypeptides can be purified from any cel, which expresses the 

Z£ ^ Pr0tt,n ' inC,Udin8 ^ ** Wh ' Ch W '""^ CANCER 
GENE express™ construe,,. A purified .3REAST CANCER GENE" poiypepfide ia separated 

from other compounds which are normally associate win, the ..BREAST CANCER GENE" 

polypeptide in me ce., such aa certam protein,, carbohydrates, or lipids, using methods we.,- 

■mown m me art. Such memoda inCude. but are no, Itaite d to. size exdusion chromamgraphy 

ammonmm sulf,,e fracionanon, ion exchange chromatography, arrmhy chroma«ography and 

preparabve ge, elecfrophoresia. A preparation of purified 3REAST CANCER GENE poly 

pcpudea ia a. leaa, 80% pure; preferably, the preparers are 90%, 95%, or 99% pure. P.JL „ 

et:~I S " ^ ^ ta - "* ^ - — '~de ge, 

Obtaining Polype ptide 

3REAST CANCER GENE" poiypeptidea can be obtain* for example, by puril!cation from 
SZT" eXPrcSSi0,> 3EEAST CANCER OENE " """"^ OT * <«- chemica, 
Biologicall y Active Vnrin„t« 

„BREAST CANCER GENE" po.ypepude variant which are biologieaUy active i e retain an 
.3RBAST CANCER GENE" activity, can be also regarded aa „BREAST CANCER^" 
poypeptidea. Preferably, nanrraHy or no„-namrany. occurring , 3 REAST CANCER GENE" 
polypeptide variants have amino acid sequences which are at leas, about 60, 65, or 70, preferabiy 
about 75, 80, 85. 90, 92, 94, 96, or 98% idenuca, m any of the amino acid sequences of the 
po lypephdes of SEQ ID NO: 166 to 330 and 492 to 5.1 or ,hepo,ypep„des encoded by any of me 
paynuc.eo.idea of SEQ ID NO: 1 ,o 165 and 472 to 49, or a fragment thereof. Percent identity 
between a putative ..BREAST CANCER GENE" polypeptide variant and of the polypeptides of 
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^ ' ,0 5 ^ 4 " to 4 " ° r ' *"* * — by convent 

~ (See, for example. „ „, 1986 , (19) a „ d Henixoff * HeniMf, 1992 (20) 

Bnefly, two amino aoid se , uences „ a|jgned ^ ^ 

rrvrr. ° f l0, a m **** pena,ty ° f -* ~*. JL , 

Henikoff & Henikoff, 1992 (20). 

Those sxiHed in «, appreciate ^ ^ ^ ^ 

am.no acd sequent. The .. FASTA „ ^ ^ rf ^ & 

sunahle prorem alignment method fOT examining , he Ieyel of iimtity sharei 

1988, p,), asrd Pe^n, 1990 , (22)] . Brjefly , FASTA ^ J 
■dentrfymg regrons share4by ^ queiy ( ^ S£Q ^ ^ ( ^ ^ * 

tea, sequence fta, have either «he highest density of identities (if ft. ^ ^ „ or ' 
■denhhes „, „ ^ ^ aojd ^-P- 

deie on* The ten regions wift ft, higher deity of idenhues are ften rescored by 

sunnan* of an paned amino acids using a„ amino acid suhstrturion ^ md ^ J* 

reg-oos ate „ taclude onIy ^ ^ ^ ^ ^ * 

exanuned ft deftmnne whefter the regions can be joinad to fonn ^ 

gap, FmaUy, the highest scoring regions o, the two amino acid sequences ate ang^il 
-non of fte HecUeman-Wunsch-SCeta aigoriftm (Needieman * Wunsch, 19 To J 1' 

ma ta x-BLOSUM62. ^e ^nte^ can he introduced into a FASTA program by modityin, 
me -nngmatnxfdersMATRK-Xaaexp.ained in Appendix 2 of Pearson, (22). 

FASTA can also he used ,o determine ft, sequence identity of nuc.eie acid mo,ecu>es using a ra«, 

ItTLT T nUC ' M,ide ^ *• — - — ' «- - one u 

PrcfMably fiW " ,taTC to six ' «»« P ref ^"'y with other parameters set as default 

Varia«„ TO ta percen, identity oan ^ ^ for ^ fc ^ ^ . 
deleuons. Ammo acid auhsHhrrions are denned aa one for one amino acid re pl aoemen«s. They ar, 
conservative in nature when ft. substituted amino acid has afaiiar structura, and/or ehemica 
propert.es. Examp.es of conservative rep,acemen,s are suction of a ieucine with an isoieucm, 
or valme, an aspartate with a glutamate, or a threonine with a serine. 
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Ammo acd msertions or de.efions are changes «o or within an amino acid sequence. They rypica.lv 
Ml m .he range of abou, Uo 5 amino acid, Guidance in dc.ennini„g which amino acid residue, 
can be subshfuted, inserted. „ r del e,ed withou. abolishing bioiogica. or immunological activity of a 
^ CANCER GENE" polypepfide can be found using computer p^gj wel , ^ 
.he a* such as DNASTAR software. Whether an amino acid change resuits in a biological* active 
..BREAST CANCER GENE" polypepfide can readily be determined by laying for BREAST 
CANCER GENE" activity, aa described for example, in the specific Exempt beJw.^ 
tnsemons or deletions can also be caused by alternative splicing. Protein domains can be inserted 
or deleted without altering the main activity of the protein. 



10 Fusion Proteins 



Fus.on proteins are useftd f„ r generating maxMe , agajnst 3R£AST q ^ 
polypeptide amino aoid sequences and for use in various assay ayatema. For example fitaion 
proteins can be used fo identify proteins which Interact with portions of a .3REAST CANCER 
GENE polypephd. Protein affinity chromatography or library-based assays for protein-protein 
ntteracfiona, such as the ye*a two-hybrid or phage display system, can be used for this purpose 
Such methods are well known in the art and also can be uaed as drug screens. 

A BREAST CANCER GENE" polypeptide fitaion protein comprisea two polypepfid. segments 
tatf together by means of a pepfide bond. The fira, polypeptide segment compriaes a, leas. 25 
50, 75. ,00, 150, 200, 300. 400. 500, 600. 700 or 750 contiguous amino acida of an amino acid 
sequence encoded by any polynucleotide sequences of the SEQ m NO: 1 to 165 and 472 to 491 or 
of a b.ologically active . variant, such as those described above. The fira. polypepfide segmen, also 
can comprise full-length .3REAST CANCER GENE". 

The second polypeptide segmen. can be a foll-lengfl, protein or a protein fragment Proteins 
commonly used in fusion protein consfmctron include p-galactesidase, p-ghrcuronidase, green 
fluorescent protein (GFP), autofluorescen. proteins, including blue fluorescent protein (BFP) 
glutathione-S-nanaferaae (GST), luciferase, horseradish peroxidase CHRP), and chlommphenicoi , 
ace*Ura„sferase (CAT). Additionally, epitope tegs are used in Won protein consteucions 
mcludmg htsudine (His) «aga, FLAG tegs, influenza hemagglutinin (HA) teg, Myc tega VSV-G 
Ug, and dtioredoxin (Trx) teg, Cher fusion cona.ruc.ions can include maltose binding protein 
(MBP). S- tag. Lex a DNA binding domain (DBD) fusion, GAM DNA binding domain fusions 
and herpes aimplex virus (HSV) BP16 protein fusions. A fusion protein also can be engineer* to ! 
contetn a cleavage site located between fhe .3REAST CANCER GENE" polypepride-encoding 
sequence and me heterofogoua protein sequence, ao that me BREAST CANCER GENE" 
polypephde can be cleaved and purified away from the heterologous moiety. 
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A fusion protein can be synthesized chemically, as is known m the art. Preferably, a fusion protei 
is produced by covalently linking two polypeptide segments or by standard procedures in the art o 
molecular biology. Recombinant DNA methods can be used to prepare fusion proteins, fo 
example, by making a DNA construct which comprises coding sequences selected from any of th 
polynucleotide sequences of the SEQ ID NO: 1 to 165 and 472 to 491 in proper reading frame witi 
nucleotides encoding the second polypeptide segment and expressing the DNA construct in a hos 
cell, as is known in the art. Many kits for constructing fusion proteins are available fror 
companies such as Promega Corporation (Madison, WI), Stratagene (La Jolla, CA), CLONTECI 
(Mountain View, CA), Santa Cruz Biotechnology (Santa Cruz, CA),. MBL Internationa 
Corporation (MIC; Watertown, MA), and Quantum Biotechnologies (Montreal, Canada- 1-888 
DNA-KITS). 

Identificati on of Species Homoln fruev 

Species homologies of human a ..BREAST CANCER GENE" polypeptide can be obtained ushv 
JBREAST CANCER GENE" polynucleotides (described below) to make suitable probes o 
primers for screening cDNA expression libraries from other species, such as mice, monkeys, o 
yeast, identifying cDNAs which encode homologues of a „BREAST CANCER GENE 
• polypeptide, and expressing the cDNAs as is known in the art. 

Expression of Polynucleotides 

To express a „BREAST CANCER GENE" polynucleotide, the polynucleotide can be inserted int, 
an expression vector which contains the necessary elements for the transcription and translation o 
the inserted coding sequence. Methods which are well known to those skilled in the art can be use. 
to construct expression vectors containing sequences encoding .3REAST CANCER GENE 
polypeptides and appropriate transcriptional and translational control elements. These method 
include in vitro recombinant DNA techniques, synthetic techniques, and in vivo geneti 
recombination. Such techniques are described, for example, in Sambrook et al., (6) and in Ausube 
etal.,(7). 

A variety of expression vector/host systems can be utilized to contain and express sequence 
encoding a ..BREAST CANCER GENE" polypeptide. These include, but are not limited tc 
microorganisms, such as bacteria transformed with recombinant bacteriophage, plasmid, or cosmi, 
DNA expression vectors; yeast transformed with yeast expression vectors, insect cell system 
infected with virus expression vectors (e.g., baculovirus), plant cell systems transformed with viru 
expression vectors (e.g., cauliflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or witi 
bacterial expression vectors (e.g., Ti or pBR322 plasmids), or animal cell systems. 
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The control elements or regulatory sequences are those regions of the vector enh, 

Lain.,, («, or pSPCRT, plasmid ^ Teclm()logies) ^ ^ 
baculovnus polyhedrin promoter can be used in insect cells Pro™, c 
me genomes o,p,an t ceUs (e.g., Kea, sboc„ RI^CO d " " **"* ^ 

necessary to generate a eel, line that contains multiple copies of a nucleotide LT " 
..BREAST CANCER GENE" po.ypeptide, vectors based on avT^ZTT^' 
appropriate selectable marker. Can be USed with an 

15 Bacterial and Yeast Bxsression Systems 

In bacterial systems, a number of expression vectors can be selected denenH" 

intended for the .3REAST CANCER OKNTP" • , depending upo „ the use 

the .3REAST CANCER oZ?^ , ? ** ~* 3 ^ ~ of 

aa BUIESCRIPT (Snatagene). In . BLUEST 1 "* 
CANCER GENE" polypep^ ean be ZZZZZLTT ^ " ~ 
amino .ernrina, Me, and m e subsequent ' ^ T^Z^Z 2 TIT 

fransferase (GST). In general, such Insion prmetas are solub.e and oan eaaHy be nurifled * 
.yaed ce„a by adsorpHon ,o gmtamione agarpse beads fo.lowed b y ebmon in t j^JT 
gmraUnone. Proren* made m ancb sys,ems oan be designed m inelnde beparm ^ZTjl^ 

to the yeaa. Saoeharomyoes oereviaiae, a number „ vectore contatotag oons 
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Plant a nd Insect Expression Systems 

If plant expression vectors are used, the expression of sequences encoding .3REAST CANCE' 
GENE" polypeptides can be driven by any of a number of promoters. For example, viral promote, 
such as the 35S and 19S promoters of CaMV can be used alone or in combination with the omeg 
leader sequence from TMV [Takamatsu, 1987, (25)]. Alternatively, plant promoters such as th 
small subunit of RUBISCO or heat shock promoters can be used [Coruzzi et al 1984 (26 
Broglie et al., 1984, (27); Winter et al., 1991, (28)]. These constructs can be introduced into plar 
cells by direct DNA transformation or by pathogen-mediated transfection. Such techniques ar 
described in a number of generally available reviews. 

An insect system also can be used to express a ..BREAST CANCER GENE" polypeptide Fc 
example, in one such system Autographa califomica nuclear polyhedrosis virus (AcNPV) is use 
as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae 
Sequences encoding JBREAST CANCER GENE" polypeptides can be cloned into a nonessentu 
region of the virus, such as the polyhedrin gene, and placed under control of the polyhedri 
promoter. Successful insertion of .JBREAST CANCER GENE" polypeptides will render th 
polyhedrin gene inactive and produce recombinant virus lacking coat protein. The recombinar 
viruses can then be used to infect S. frugiperda cells or Trichoplusia larvae in which AREAS' 
CANCER GENE" polypeptides can be expressed [Engelhard et al., 1994, (29)]. 

Mammali an Expression System* 

A number of viral-based expression systems can be used to express' .3REAST CANCER GENE 
polypeptides in mammalian host cells. For example, if an adenovirus » used as an express^ 
vector, sequences encoding , 3 REAST CANCER GENE" polypeptides can be ligated into a 
adenovirus transcription/translation complex comprising the late promoter and tripartite leade 
sequence. Insertion in a nonessential El or E3 region of the viral genome can be used to obtain ■ 
viable virus which is capable of expressing a ..BREAST CANCER GENE" polypeptide in infecte, 
host cells [Logan & Shenk, 1984, (30)]. If desired, transcription enhancers, such as the Rou 
sarcoma virus (RSV) enhancer, can be used to increase expression in mammalian host cells. 

Human artificial chromosomes (HACs) also can be used to deliver larger fragments of DNA tha, 
can be contained and expressed in a plasmid. HACs of 6M to 10M are constructed and delivered t, 
cells via conventional delivery methods (e.g., liposomes, polycationic amino polymers, o 
vesicles). 
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Specific initiation signals alsc can be used ,o achieve more efficient Nation of 

encoding 3REAST CANCER GENE" peptide, Such ^ < ™ " £~ 

co on an adjacent fc _ s ^ ^ ^ ^ — 

polypeptide, te i ni « ati on codon , Md upstream TCAHCttOENB 

expression vector, „„ additional „ " 

However, in casea where „„ ly coding s^uetKe, or a fn^ZZ A ^ ± 

tians,ationa, contio, signaU (including th e ATG initiati^ 27 L dT TTT 

initiation codon should he in the correc, reading ft™ ,o ensure Is l^o d, Z 

Exogenous translations! elements and inidatton codona can be of varioT T 

acetic. The efficiency of expression can be ^Z ^^ZHZ^ Th ^ 



/far/ CeHt 



A host can strain can be chosen for its ability ,„ modulate the expression of a. in^ 
or to process the expressed ..BREAST CANCER GFNP" , """"^ 
modifications of the po,^ J™ ™ ^ - — ~ 

glycogen, phoaphory.adcn. tipidation. and J^Zl^T' ^ Maa - 
cleaves a W for™ of the polypeptide also can be uTd , TZZZZJTT' ^ 
and/or function. Different host ceUs which have specific ^ lnXn """ 
— - forPost-translationa, aotivifa <*£££^-J— I* 

z^r^rrrr °— (atcc; ™ -i: 

». vn iui and can be chosen to ensure the correct mn rft<;„.- 

of the foreign protein. modtficatton and processing 

Stable expression is preferred for iong-tenn, hig^,,, productjon „ ^ . 

example, eeH lines which stably express ..BREAST CANCER GEN^ 7^ 

— - using expreaaion vectors which can ^ ^ o^ " " 

Followtng .he mtioducdon of ,he vector. cells can be allowed to grow for 12 days in 2 ^ 
medium before drey are switched ,„ a selective medium. The purpTe of the ZZZ ' 

~ ,is cm ta — ---- -~r:ri 

type [Freshneyetal., 1986,(32). cel1 

Any number of selection systems can be used to recover transformed eel, tinea. These include but 
am no, hnured to , the her pes simplex virus thyraidjne ^ ^ ^ ^ 
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adenine phosphoribosyltransferase [Lowy et al., 1980, (34)] genes which can be employed in tk" o 
aprt- cells, respectively. Also, antimetabolite, antibiotic, or herbicide resistance can be used as th, 
bas,s for selection. For example, dhfr confers resistance to methotrexate [Wigler et al., 1980, (35)] 
npt confers resistance to the aminoglycosides, neomycin and G418 [Colbere-Garapin et al. 1981 
(36)], and als and pat confer resistance to chlorsulfuron and phosphinotricin acetyltransferase 
respectively. Additional selectable genes have been described. For example, trpB allows cells t< 
utilize indole in place of tryptophan, or hisD, which allows cells to utilize histinol in place o 
lustidine [Hartman & Mulligan, 1988 ,(37)]. Visible markers such as anthocyanins ft 
glucuronidase and its substrate GUS, and luciferase and its substrate luciferin, can be used tc 
identify transformants and to. quantify the amount of transient or stable protein expressioi 
attributable to a specific vector system [Rhodes et al., 1995, (38)]. 

Detecting Expression a ndeene nrndurt 

Although the presence of marker gene expression suggests that the ..BREAST CANCER GENE' 
polynucleotide is also present, its presence and expression may need to be confirmed. For example 
if a sequence encoding a J3REAST CANCER GENE" polypeptide is inserted within a mark* 
gene sequence, transformed cells containing sequences which encode a J3REAST CANCEJ 
GENE" polypeptide can be identified by the absence of marker gene function. Alternatively { 
marker gene can be placed in tandem with a sequence encoding a .3REAST CANCER GENF 
polypeptide under the control of a single promoter. Expression of the marker gene in response tc 
induction or selection usually indicates expression of the ..BREAST CANCER GENE' 
polynucleotide. 

Alternatively, host cells which contain a ..BREAST CANCER GENE" polynucleotide and whicl 
express a JBREAST CANCER GENE" polypeptide can be identified by a variety of procedure- 
known to those of skill in the art. These procedures include, but are not limited to, DNA-DNA oi 
DNA-RNA hybridization and protein bioassay or immunoassay techniques which includ< 
membrane, solution, or chip-based technologies for the detection and/or quantification o 
polynucleotide or protein. For example, the presence of a polynucleotide sequence encoding \ 
..BREAST CANCER GENE" . polypeptide can be detected by DNA-DNA or DNA-RW 
hybridization or amplification using probes or fragments or fragments of polynucleotides encoding 
a .3REAST CANCER GENE" polypeptide. Nucleic acid amplification-based assays involve th< 
use of oligonucleotides selected from sequences encoding a ..BREAST CANCER GENE' 
polypeptide to detect transformants which contain a .3REAST CANCER GENE" polynucleotide. 

A variety of protocols for detecting and measuring the expression of a ..BREAST CANCE5 
GENE" polypeptide, using either polyclonal or monoclonal antibodies specific for the polypeptide 
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are known in the art. Examples include enzvme lin^w • 

enzyme-linked immunosorbent assay fELISA% 

-* (MA), fluoresce .c^ c„ sorting <p ACS) . A ^ 

based tmmunoassay using monoclonal mttbodta ' c,OTal 

^ -Ployed W-M ^~ "* 

A wide variety of labe,s and conjugal technic „ ^ by ^ 

« £ ;~ :r acid and amino acid assays - Mrans - — 

GE^ « d "! S ~ re ' ated ,0 «- «*. .3REAST CANCER. 

GENE polypepndes mclude oligo labeling, nick translation, end-labeling or PC R amDlifi . 

a labeled ^.eoride. A,,™, sequences eneoding . ZZ ^~ 

polypepbde ean be Coned in* a vector for the produotion of a „ ^ * 

f Tb — - - - — * RNA probes ^ 

by add.oon of labeled nucleuses and an appropriate RNA polymerase soeh as V. T3 2 « 
TW procedures can be conducted using a variety of commercially availab,e 
haemal Biotecb. Promega, an d US Bioebernica,, Suitable reporj moteou,! ^£ZZ 
can be. used for ease of oefccnon ^ ^ ^ 

J ^btomogenic agent, as - as substrate, .factor, in.bi.rs. Z2 
Expression and Purifon*™ of p 0 i v ppptl/1ov 

Host ceUs transformed with nuCeoMe sequences encoding a .3REAST CANCER GENE" 
polypepnde ean be cuW under editions suitebte for tbe expression and recovery ^1 
protem from cell cu,«ure. The polypeptide produced by a bansformeo eel, can be seem Z Zr 1 d 
utnaeellular depending on tbe sequence and,„r tbe vector used. As be 

pZpuT " ° f *— CANCER GENE" 

clN^R^r araa0nanK, ' 0na ^ * toJ ° te ' — - - — ■ -3REAST 
CANCER GENE polypepbde to a nucleotide sequence encoding . polypeplide donMfa 

purification of solubte proteins. Sucb puriScabon facilitating domains include." ' 

no. bunted to. metal chelating peprides soph as hisuoine-tryptophan modules th , Z 

purrflcabon o„ tmmobih^ mete,, protein A domains tha, allow purifoaHon on immobilized 

tmmunoglobuhn. and tbe domain uuH*d in tbe FLAGS extension^ purification system 
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(Immunex Corp., Seattle, Wash.). Inclusion of cleavable linker sequences such as those specifi, 
for Factor Xa or enterokinase (Jhvitrogen, San Diego, CA) between the purification domain an, 
the .3REAST CANCER GENE" polypeptide also can be used to facilitate purification. One sue] 
expression vector provides for expression of a fusion protein containing a ..BREAST CANCEI 
GENE" polypeptide and 6 histidine residues preceding a thioredoxin or an enterokihase cleavag, 
site. The histidine residues facilitate purification by IMAC (immobilized metal ion affinit; 
chromatography [Porath et al., 1992, (41)], while the enterokinase cleavage site provides a mean 
for purifying the , 3 REAST CANCER GENE" polypeptide from the fusion protein. Vectors whicl 
contain fusion proteins are disclosed in Kroll et al., (42). 

Chemical Synthesis 

Sequences encoding a .3REAST CANCER GENE" polypeptide can be synthesized, in whole o 
in part, using chemical methods well known in the art (see Caruthers et al., (43) and Horn et al 
(44). Alternatively, a ..BREAST CANCER GENE" polypeptide itself can be produced usin, 
chemical methods to synthesize its amino acid sequence, such as by direct peptide synthesis usin, 
solid-phase techniques [Merrifield, 1963, (45) and Roberge et al., 1995, (46)]. Protein synthesi, 
can be performed using manual techniques or by automation. Automated synthesis can b, 
achieved, for example, using Applied Biosystems 431A Peptide Synthesizer (Perkin Elmer) 
Optionally, fragments of JBREAST CANCER GENE" polypeptides can be separately synthesize 
and combined using chemical methods to produce a full-length molecule. 

The newly synthesized peptide can be substantially purified by preparative high performanc, 
liquid chromatography [Creighton, 1983, (47)]. The composition of a synthetic J3REAST 
CANCER GENE" polypeptide can be confirmed by amino acid analysis or sequencing (e.g., th< 
Edman degradation procedure; see Creighton, (47). Additionally, any portion of the amino aci< 
sequence of the ..BREAST CANCER GENE" polypeptide can be altered during direct synthesi; 
and/or combined using chemical methods with sequences from other proteins to produce a varian 
polypeptide or a fusion protein. 

Production of Altered P olypeptide* 

As will be understood by those of skill in the art. it may be advantageous to produce ..BREAST 
CANCER GENE" polypeptide-encoding nucleotide sequences possessing non-natural occurrin, 
codons. For example, codons preferred by a particular prokaryotic or eukaryotic host can b. 
selected to increase the rate of protein expression or to produce- an RNA transcript havin, 
desirable properties, such as a half-life which is longer than that of a transcript generated from th, 
naturally occurring sequence. 
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T*e nucleotide sequences disced herein can be engineered using methods gOTera||y 
.he art ,o a„er , 3 REAST CANCER GENE" polypeptide^coding sequences for a variety of 
reasons, inc.uding bu, no, limited to, aUerationa which modify the cloning, processing, and/or 
expression of the polypeptide or mRNA product DNA shuffling by random fragmentation and 
PCR reassembly of gene fragments and synthetic oligonucleotide* can be used to engineer Ute 
nucleotide sequences. For example, si «e-direc,ed mutagenesis can be used to insert new miction 
sites, alter glycosylate patterns, chsrtge codon preference, produce splice variants, induce 
mutations, and- so forth. 

Predictive. Diagnosti c and Prn cm ngtic Ansa™ 

The present invention provides compositions, methods, and lots for determining whether a subject 
is at nsk for developing malignant neoplasia and breast cancer in particular by detecting the 
dxsclosed biomarkers, i.e., the disclosed polynucleotide markers comprising any of the 
polynucleotides sequences of the SEQ ID NO 1 to 165 and 472 to 491 and/or the polypeptide 
markers encoded thereby or polypeptide markers comprising any of the polypeptide , sequences of 
the SEQ ID NO: 166 to 330 and 492 to 5 1 1 for malignant neoplasia and breast cancer in particular. 

In clinical applications, biological samples can be screened for the presence and/or absence of the 
biomarkers identified herein. Such samples are for example needle biopsy cores, surgical resection 
samples, or body fluids like serum, thin needle nipple aspirates and urine. For example, these 
methods include obtaining a biopsy, which is optionally fractionated by cryostat sectioning to 
ennch diseases cells to about 80»/o of the total cell population. In certain embodiments 
polynucleotides extracted fromthese samples may be amplified .using techniques well known in' 
the art. The expression levels of selected markers detected would be compared with statistically 
valid groups of diseased and healthy samples. 

In one embodiment the compositions, methods, and kits comprises determining whether a subject 
has an abnormal mRNA and/or protein level of the disclosed markers, such as by Northern blot 
analysis, reverse transcription-polymerase chain reaction (RT-PCR), in situ hybridization 
•mmunoprecipitation, Western blot hybridization, or immunohistochemistry. According to the 
method, cells are obtained from a subject and the levels of the disclosed biomarkers, protein or 
mRNA level, is determined and compared to the level of these markers in a healthy subject An 
abnormal level of the biomarker polypeptide or mRNA levels is likely to be indicative of 
malignant neoplasia such as breast cancer. 

In another embodiment the compositions, methods, and kits comprises determining whether a 
subject has an abnormal DNA content of said genes or said genomic loci, such as by Southern blot 
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Genormc Hybn^on or ^Sve PC R. In genera, these assays 00mprjse fte ^ 

from genomic regions. „ probra contain at least pam of sajd 8 P " 

seances comp,eme»nry or ana.ogons ,„ said regions . „ ^ intta . or » 

5 sat genes or genomic region, TOe probes ca , ^ „ f *™ 

ana.ogoua Amotions (e.g. PNAs , MorphoIino oli80mers) ^ ^ < * ' 

bybnd^on. „ genera, genome regions being a, te red in said pat ie„« samp.es are LpT^ 
unaffected CMlrol ^ ^ ^ ^ ^ ^ ^ < 

una^d nasne, peripnera, Hood, or with genomic regions of Ore same aampie .hat don, tv 
.0 sam bos and can tnerefore serve as interna, conn*, m a preferred embodiment ^ 
ta~d on me snme c_nc a„ nsed. A,.en,ative,y, gonosoma, regions and /or regions win 
defined varyng amoon. in me samp.e are nsed. m one favored embodiment me DnI content 
— compo Hon or modiEMtion „ compared ^ ^ ^ disttect 

^ ^ ^^oredarememodama.d.iec.meDNAcon^ofsaidsamp^wheremeamoLo 
15 target regtons are aUered by ampiificafion and or dCetions. to another embodtaent me targe 
reg,ona are analyzed for me presence of po,ymorphistns (e.g. Singie NnCeonde Po-ymorphisms o 

affect OT * *• ce,,s ta said ^ •» — - -SS 

of Aagnosfic, prognostic or merapeooc vahre. PreferaWy, me identification of seouence variation 
20 ^. ° d ^ ta " to ~'— fic bcbaviorofsaidsampieaUaaid^: 

In one embodiment, the cnmpoaifiona, methods, and kits for ^prediction, diagnosis prognoai. 
of mahgnan, neoplasin and breas, cancer in par.ic.dar are done by me detection of. 

a polynacleonde S e,ected ftom me poiynuclcofides of «he SEQ ID NO: I ,o 165 and 472 «, 

a polynucleotide which hybridizes under stringent conditions to a polynucleotide specifiec 
» (a) encoding a polypeptide exhibiting the same biological function as specified for th< 
respective sequence in Table la and lb or 4a and 4b; 

a polynucleotide the sequence of which deviates from the polynucleotide specified in (a 
and (b) due to the generation of the genetic code encoding a polypeptide exhibiting I 
same b.ologzcal function as specified for the polypeptides of SEQ ID NO: 166 to 330 an. 
492 to 511 w-Muanc 

a polynucleotide which represents a specific frag,™, derivative or allelic variation of £ 
polynucleotide sequence specified in (a) to (c) encoding a polypeptide exhibiting the sam! 



(a) 



25 (b) 



(c) 



(d) 
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biological function as specified for the respective sequence in Table la and lb or 4a and 
4b; 

in a biological sample comprising the following steps: hybridizing any polynucleotide or 
analogous oligomer , specified in (a) to (d) to a polynucleotide material of a biological sample 
5 thereby forming a hybridization complex; and detecting said hybridization complex. 

In another embodiment the method for the prediction, diagnosis or prognosis of malignant 
neoplasia is done as just described but, wherein before hybridization, the polynucleotide material 
of the biological sample is amplified. 

In another embodiment the method for the diagnosis or prognosis of malignant neoplasia and 
0 breast cancer in particular is done by the detection of: 



(a) 



a polynucleotide selected from the polynucleotides of the SEQ ID NO: 166 to 330 and 492 
to 511; 



(b) a polynucleotide which hybridizes under stringent conditions to a polynucleotide specified 
m (a) encoding a polypeptide exhibiting the same biological function as specified for the 
respective sequence in Table la and lb or 4a and 4b; 

(c) a polynucleotide the sequence of which deviates from the polynucleotide specified in (a) 
and (b) due to the generation of the-genetic code encoding a polypeptide exhibiting the 
same bxological function as specified for the respective sequence in Table la and lb or 4a 
and 4b; . 

(d) a polynucleotide which represents a specific fragment, derivative or allelic variation of a 
polynucleotide sequence specified in (a) to (c) encoding a polypeptide exhibiting the same 
bzolopcal function as specified for the respective sequence in Table la and lb or 4a and 
4b; 

(e) a polypeptide encoded by a polynucleotide sequence specified in (a) to (d) 

(f) a polypeptide comprising any polypeptide of SEQ ID NO: 166 to 330 and 492 to 5 11 ' 
(g) 

comprising the steps of contacting a biological sample with a reagent which specifically interacts 
with the polynucleotide specified in (a) to (d) or the polypeptide specified in (e). 
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■< 41M0O oligonudeotides (OeneChip, AfW^ ™* , ^ * ^ Ca " h °' d ^ 1 

— * « „ fc tesl by provWtag . _ rrz 

liquids (e.g. derived from fine needle aspirates). The DNA or CO " laini " 
an- an^ed with a DNA chip to oetermine J^J l^e ^ 1' ^ 
sequences. » one embodiment, ^ po,^,^ " rr^T^ PO ynUO " OW 

the probe,. Double-stranded po,ynuc,eondes. comprising the w£T ' 

The probe polynne.eoodes can he ported „» substrates including glass ntaoc, , 
probes can be bound to the substrate by either covalent bonds or by ^T^ ^ " 
as hydrophobic interacuon, The sample polynudeotides can he JZ Zl 1 ^ 
nuorophores, chromophores, etc. Technic for constructing 

used to canine differentia, expression of genes and can be used J^T' 7" " 
— * arrays of the ins** ^ _ J — «— * 

po.ynuc,eoude sequences are differentia,,, expressed between^l^T 7 " " 
-amp,.. High expression of a pa rti cu,ar message m a diseased saZe ^ch s IIT "1' " 
correspond** no™, S amp,e, can indicate a breast cancer specie pl^ ^ - 

Accotdingly, in one aspect, the invention provides probes and primers «,„ 
PO>ynuc, MH dese,ue„cesofSEQ mN 0: , „ ,65 and7 72 to^, ^ " * 
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b, one embodimen,, fc coraposition> raeaiod> ^ ^ 

the presence of maUg^n. or bmas, cancer cells m fc . ^ 

Specifically, the method comprises: 

1) providing a polynucleotide probe comprising a nucleotide sequence at least ,2 nucleotides 
- length, preferably at least 15 nucleotides, more preferably, 25 nucleotides, and most 
preferably at least 40 nucleotides, and up to all or nearly all of the coding sequence which 
.s complementary to a portion of the coding sequence of a polynucleotide selected from the 
~eotides of SEQ H> NO: 1 to 165 and 472 to 491 or a sequence complementary 

10 2) obtaining a tissue sample from a patient with malignant neoplasia; 

3) providing a second tissue sample from a patient with no malignant neoplasia; 

4) -contacting the polynucleotide probe under stringent conditions with RNA of each of said 

first and second tissue samples (e.g., in a Northern blot or in situ hybridization assay); and 

5) comparing (a) the amount of hybridization of the probe with RNA of the first tissue 
sample, with (b) the amount of hybridization of the probe with RNA of the second tissue 
sample; 

wherein a statically significant ditferenM fa me amolmt of . hybridi2aHon ^ ^ ^ 
fir* tissue samp,* as compared to the amount of hybridisation with the RNA of the second tissue ' 
samp e indicative of maligns* neop,asia and breast cancer in parser in me fira, tissue 
20 sample. 

2- Data ana lysis methndx 

Comparison of the expression levels of one or more "BREAST CANCER GENES" with reference 
expression levels, eg., expression levels in diseased c.„s of breast cancer or in norma, counterpart 
cells, ,s prefembly conducted using computer sysrems. m one embodiment, expreasion ,eve,s are 
obtatned ,„ .wo ceUs and these two sets of expression levels are introduced into a computer system 
for companson. In a preferred embodiment, one set of expression levels is entered into a computer 
system for comparison with values tha. are already present in the computer system, or in computer 
readable form that is then entered into the computer system. 

In one embodiment, me ipvention provides a computer readable form of tine gene expression 
30 profile data of the invention, or of values corresponding to the ,evel of expression of a. leas, one 
"BREAST CANCER GENE" in a diseased ceU. The values can be mRNA expression ,eve,s 
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obtained from experiments, e. gi , mi^y ^ ^ ^ ^ ^ 

nomialised relative to a reference acne w»n». »» • ■ 

nee gene whose expression is constant in numerous cells unde 
numerous conditions, e.g., OAPDH In nilm- » m k„j- . ... 

nford.H- . 11 01her ""intents, the values in the computer are ratio 

of, or differences between, normal or non-normalized iriRNA levels in different ss mple , 

The gen. expression profile dafa can be in the form of a fab.e, such as an Eace, The da* ca, 
be alone, or it can be par, o, a ,arger database, e.g., comprising other expression profiles Fo 
example, the expression profifc data of the invention can be par, of a p„b,ic database ' Th. 
computer readable form can be in a compute ta anoUier embodiment, me invention provides , 
computer displaying the gene expression profile dam. 

is, one embodiment me invention provides a method for defining me simuarity hereon th 
level of expression of one or more "BREAST CANCER GENES" in a first cell c a a cell of 
subject, and mat in a second ceil, comprising obtaining the leve! of expression of one or mon 
B ^ TC ™ GENES '— — rering fbese values inmacompuJ™ 
a database .unhiding records comprising values corresponding ,„ ,evels ofexpression of one o 
mpre "BREAST CANCER GENES" in a second ce„, and processor instructions, e.g., Hse 

2T:T , : fMe ^ 8a ~° f0MOr — --^-—purposeswimda, 
ma, ,s stored in me computer. The computer may further comprise a means for converting th, 
companson data into a diagram or char, or other type of output 

m another embodiment, values representing expression levels of "BREAST CANCER GENES 
are entered into a computer system, comprising one or more databases wim reference expressio. 
■eve* pbtamed from more than one cc„. For examp,e. me computer comprises expression d ate o 
seased and norma, cells, fnstiuetions are provided «o me computer, and the computer is capab,, 
of comparing the date entered wim me date in me computer to determine whether the date enters, 
is more similar to that of a normal cell or of a diseased cell. 

L, another embodiment, the computer comprises values ofexpression tevels in cells of subjects a 
different stages of breast cancer, and the computer is capable of comparing expression date entere, 
into .the computer wim me date stored, and produce results indicating ,„ „ hich of the expressioi 
profiles m the computer, me one entered is mos, similar, such as to determine tee stege of breas 
cancer in the subject. 

m yet another embodiment, the reference expression profiles in the computer are expressio, 
profiles from cells of breast cancer of one or more subjects, which cells are treated in vivo or ft 
vuro with a drug used for therapy of breast cancer. Upon entering ofexpression data of a cell of 
subject treated in vitro or in vivo with the drug, the computer is instructed to compare the dat 
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ente ed ,o fi,e data . ft. computer, and «o provide resulls indicating v,had, CT ». „ m ^ 

»,pu, — co mPU «= r m _ ^ „ ^ of . cell ofa subjret ^ . ; e £ 

or more surular t o fi,ose of a eel! of a subject .ha, is „„, responsive ,o the drug Thus fir. result! 

^z:t * "** * ifteiy * — ,o - • - - uT^t 

m one embodiment, fine invention provides a system that comprises a rueans for receiving gene 
™ data for one or a p,ura Iity of genes; a means for comparing m e gene express^ date 
from each of sa,d one or plura,i,y of genes w . common refer „ M 

presenung me resulte of the comparison. This systenr ma y fcrfirer comprise a means for clustering 



10 the data 



to another embo .men, the invention provides a computer progmm for analog ge „e expression 
da,, compnstng (,) a computer code tha, receives as inpu, gene expression date for a p,uL«y of 
-es an (,.) a computer code ma, compos said gene expression data from each of said p.Jly 
of genes to a common reference frame. 

™~ou -» Provides a machine-readable or computerate medium inol uding program 
rnsmucnons for performing the foIlowing step5; (j) compaitog , rf ^ 

te express™ ,evc,s of one or n^e genes characteristic „, breas, cancer in a „uery ceU wim I 
database nrCudmg recm* comprising reference expression or expression profde dl of oTor 
more re erenc CU and an am.on.don of the type of eel, and (ii) indicating «o which ceU fine 
query eel, „ mos, simiiar based on simiiarines of expression pro flle5 . The reference celis can be 
cen, from subjects a, different steges of breas, cancer. The reference * can aiso be ce„s from 
subject responding or no, responding ,„ a parser d™ g u^men, and ^ 
vitro or in vivo with the drug. 

™^rs ,u m,y aiso be ^ *~ subiKB rcsp ° nding °- - « * - — ■ 
tr; rr *■ compu,er **- * «- - -*« 

Accordmgiy, ,he mvendon provides a mefirod for s«,ecnng a firerapy for a pafien, having breas, 
cancer, fine mefirod comprising: (i) providing fire ieve, of expression of one or more ^ 
charge of breas, cancer in a diseased ccU of fire pafien,; „ providing . ^ 
reference profdes, each associated with a thempy, where in fire subject expression profile andeach 
reference profile has a p.uramy of vaiues, each value represenfing fir. level of expression ofa gene 
characte„s,ic of breas, cancer; and (in) selecting fire reference proMe mos, similar ,o fire subject 
express™, proffle, to fireriby se.ee, a fterapy for said pafienr. „ . preferred 
■s performed by a compute. The mos. similar reference profHe may be selected by weighing a 
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comparison value of the plurality using a weight value associated with the correspondin 
expression data. 

The relative abundance of an mRNA in two biological samples can be scored as a perturbation an 
its magnitude determined (i.e., the abundance is different in the two sources of mRNA tested) c 
as not perturbed (i.e., the relative abundance is the same). In various embodiments, a differenc 
between the two sources of RNA of at least a factor of about 25% (RNA from one source is 25°, 
more abundant in one source than the other source), more usually about 50%, even more often by 
factor of about 2 (twice as abundant), 3 (three times as abundant) or 5 (five times as abundant) i 
scored as a perturbation. Perturbations can be used by a computer for calculating and expressio 
comparisons. 

Preferably, in addition to identifying a perturbation as positive or negative, it is advantageous t 
determine the magnitude of the perturbation. This can be carried out, as noted above far 
calculating the ratio of the emission of the two fluorophores used for differential labeling, or b; 
analogous methods that will be readily apparent to those of skill in the art. 

The computer readable medium may further comprise a pointer to a descriptor of a stage of breas 
cancer or to a treatment for breast cancer. 

In operation, the means for receiving gene expression data, the means for comparing the gen. 
expression data, the means for presenting, the means for normalizing, and the means for clusteruv 
within the context of the systems of the present invention can involve a programmed compute' 
with the respective functionalities described herein, implemented in hardware or hardware an, 
software; a logic circuit or other component of a programmed computer that performs th 
. operations specifically identified herein, dictated by a computer program; or a computer memor 
encoded with executable instructions representing a computer program that can cause a compute 
to function in the particular fashion described herein. 

Those skilled in the art will understand that the systems and methods of the present invention nur 
be applied to a variety of systems, including IBM-compatible personal computers running MS 
DOS or Microsoft Windows. 

The computer may have internal components linked to external components. The interna 
components may include a processor element interconnected with a main memory. The compute 
system can be an Intel Pentiums-based processor of 200 MHz or greater clock rate and with 3: 
MB or more of main memory. The external component may comprise a mass storage, which can b 
one or more hard disks (which are typically packaged together with the processor and memory] 
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Such hard disks are typically of 1 GB or greater storage capacity. Other external components 
include a user interface device, which can be a monitor, together with an inputing device, which 
can be a "mouse", or other graphic input devices, and/or a.keyboard. A printing device can also be 
attached to the computer. 

Typically, the computer system is also linked to a network link, which can be part of an Ethernet 
hnk to other local computer systems, remote computer systems, or wide area communication 
networks, such as the Internet. This network link allows the computer system to share data and 
processing tasks with other computer systems. 

Loaded into memory during operation of this system are several software components, which are 
both standard in the art and special to the instant invention. These software components 
collecUvely cause the computer system to function according to the methods of this invention 
These software components are typically stored on a mass storage. A software component 
represents the operating system, which is responsible for managing the computer system and its 
network interconnections. This operating system can be, for example, of the Microsoft Windows- 
family, such as Windows 95, Windows 98, or Windows NT. A software component represents 
common languages and functions conveniently present on this system to assist programs 
implementing the methods specific to this invention. Many high or low level computer languages 
can be used to program the analytic methods of this invention. Instructions can be interpreted 
dunng run-time or compiled. Preferred languages include C/C++, and JAVA* Most preferably 
the methods of this invention are programmed in mathematical software packages which allow 
symbolic entry of equations and high-level specification of processing, including algorithms to be 
used, thereby freeing a user of the need to procedurally program .individual equations or 
algonthms. Such packages include Matlab from Mathworks (Natick, Mass.), Mathematica from 
Wolfram Research (Champaign, 111.), or S-Plus from Math Soft (Cambridge, Mass.). Accordingly 
a software component represents the analytic methods of this invention as programmed in a 
procedural language or symbolic package. In a preferred embodiment, the computer system also 
contains a database comprising values representing levels of expression of one or more genes 
characteristic of breast cancer. The database may contain one or more expression profiles of genes 
characteristic of breast cancer in different cells. 

In an exemplary implementation, to practice the methods of the present invention, a user first loads 
expression profile data into the computer system. These data can be directly entered by the user 
from a monitor and keyboard, or from other computer systems linked by a network connection or ' 
on removable storage media such as a CD-ROM or floppy disk or through the network. Next the 
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user causes execution of expression profile analysis software which performs the steps c 
comparing and, e.g., clustering co-varying genes into groups of genes. 

In another exemplary implementation, expression profiles are compared using a method describe 
in U.S. Patent No. 6,203,987. A user first loads expression profile data into the computer systen 
5 Geneset profile definitions are loaded into the memory from the storage media or from a remot 
computer, preferably from a dynamic geneset database system, through the network. Next the use 
causes execution of projection software which performs the steps of converting expression profil 
to projected expression profiles. The projected expression profiles are then displayed. 

In yet another exemplary implementation, a user first leads a projected profile into the memorj 
10 The user then causes the loading of a reference profile into the memory. Next, the user causes th 
execution of comparison software which performs the steps of objectively comparing the profiles. 

3- Detection of variant polynucleotide sequerice 

In yet another embodiment, the invention provides methods for determining whether a subject is a 
risk for developing a disease, such as a predisposition to develop malignant neoplasia, for exampl 
breast cancer, associated with an aberrant activity of any one of the polypeptides encoded by an 
of the polynucleotides of the SEQ ID NO: 1 to 165 and 472 to 491, wherein the aberrant activity o 
the polypeptide is characterized by detecting the presence or absence of a genetic lesio, 
characterized by at least one of these: 



15 



(0 



an alteration affecting the integrity of a gene encoding a marker polypeptides, 



or 



20 (ii) the misexpression of the encoding polynucleotide. 

To illustrate, such genetic lesions can be detected by ascertaining the existence of at least one o 
these: 

I. a deletion of one or more nucleotides from the polynucleotide sequence 

II. • an addition of one or more nucleotides to the polynucleotide sequence 
25 HI. a substitution of one or more nucleotides of the polynucleotide sequence 

IV. a gross chromosomal rearrangement of the polynucleotide sequence 

V. a gross alteration in the level of a messenger RNA transcript of the polynucleotid 
sequence 
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aberranf modification of the polynucleotide sequence, such as of the methylation pattern of 
the genomic DNA 



VH. the presence of a non-wild type splicing pattern of a messenger RNA transcript of the gene 

Vm. a non-wild type level of the marker polypeptide 

IX. allelic loss of the gene 

X, inappropriate post-translational modification of the marker polypeptide 

The present invention provides assay techniques for detecting mutations in the encoding 
polynucleotide sequence. These methods include, but are not limited to, methods involving 
sequence analysis, Southern blot hybridization, restriction enzyme site mapping, and methods 
uivolvmg detection of absence of nucleotide pairing between the polynucleotide to be analyzed 
and a probe. 

Specific diseases or disorders, e.g., genetic diseases or disorders, are associated with specific 
allehc variants of polymorphic regions of certain genes, which do not necessarily encode a mutated 
protem. Thus, the presence of a specific allelic variant of a polymorphic region of a gene in a 
subject can render the subject susceptible to developing a specific disease or disorder 
Polymorphic regions in genes, can be identified, by determining the nucleotide sequence of genes 
m populations of individuals. If a polymorphic region is identified, then the link with a specific 
d 1S ease can be determined by studying specific populations of individuals, e.g. individuals which 
developed a specific disease, such as breast cancer. A polymorphic region can be located in any 
regxon of a gene, e.g., exons, in coding or non coding regions of exons, introns, and promoter 
region. 

In an exemplary embodiment, there is provided a polynucleotide composition comprising a 
polynucleotide probe including a region of nucleotide sequence which is capable of hybridising to 
a sense or antisense sequence of a gene or naturally occurring mutants thereof, or 5' or 3' flanking 
sequences or intronic sequences naturally associated with the subject genes or naturally occurring 
mutants thereof. The polynucleotide of a cell is rendered accessible for hybridization, the probe is 
contacted with the polynucleotide of the sample, and the hybridization of the probe to the sample 
polynucleotide is detected. Such techniques can be used to detect lesions or allelic variants at 
either the genomic or mRNA level, including deletions, substitutions, etc., as well as to determine 
mRNA transcript levels. 
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A prefer detechon method is alfele ^ hybrjdizat . on 

- ^ — * *>. «■ - 30 nucleotides around ,„e J^" 
poymorputc rcg.cn. In . ^ ^ oiimcM of ^ ^ 

hybndtsmg speoficaUy „ all efic variants are adached «o a soUd phase supp l e g ^. 

5 Mutation detection analysis iraWth~. ..• ' p 

n analysts usmg these chtps comprising oligonucleotides, also termed »DN 

■»*. arrays" is described e.g., in Cronin e. a.. (48, !n one embodiment, a chip comprise! aH « 
anehc vartants of a, ,eas t one po.ymorphic region of a gene. The so,id ph aL au^Ti tn 
contact a ,es, polynucleotide and hybridiaadon to me specific probes T dlcte 
Accordmgly the identity of numerous aUelio variants of one or more genea cl he ident^ 
10 simple hybndization experiment. ° m 

m eermm embodiments, detecnon of the leaion comprises uHlizing me prohe/primer in 
polymerase cham reacnon (PCR> (see , e.g. U.S. Patent No, 4,68 3>f 95 and 4 683 202, k 

1988, (49) and Nakazawa et al., 1994 (50V th P l at +^ i.- t 
,« A ♦ * 1 ;J ' 6 latter of wluch can b e particularly useful fc 

15 detechng point mutations in the gene; Abravaya et al 1995 « ni T 

— , fc me.od indudea the ateps of L^^L 
- ah »g pmynudeodde (e.g., gcomic, mRNA or horn, from the ceUa of me s^O 
conteenng the po^Hde aample w ith one or more primers which specif hybrid » 
P^ ynucleoh e se,uence under conditions such tha, hybridization and amp,L^Tf «, 
Z0 po.yt.uc.eohde (if present) occurs, and (iv, detecting the pmsence or abaenee of an Zlt 
Ptoduc, or de.ec.ing me sfe e of me ampfificafion produc. and comparing me .engd, ^ 
~£ > . anhc.pa.cd .ha, PCR and/or LCR may he desirable to use as a prelimu^ 



25 



30 



amplification methods incite: self susteinea se q uence replicarion [OuateHi, , C . « a, 

1 h 7 . r 1 " " poI ^' KlK " i ' 1 a amplificaMon meftod, followed tr 

«.e detechon of mc amphfied mCecules using technics we„ known' .o those ofsfc,, ^ 
etechon schemes are especiaUy usefu, for detection of poIymlcIeoMe ^ ^ 
molecules are present in very low numbers. 

m a prefe^d embodiment of the subject assay, mutations in, oraUelic variants, of a 8ene from . 
samp e ccU are identified by aherafions in reshricdon enzyme Ceavage pattern, F " e 
samp,e and contro, DNA is isol a tt d, amplified (opdonally). digested w*h one or more J££ 
endonucleases, ami fra^ Iength sfaes „ fcy ^ ^ ^ 



BHC03 1 nm.ni 



10 



15 



20 



25 



30 



-54- 

of sequence specific ribozymes (see, for exampte, U.S. Pa.cn, No . 5 , 498>531) can „. ^ to 
for .he presence ofspecific mutations by d eve.opmen. or loss „ . ^ 

4 - In situ hy b ridization 

m one aspec, the method comprises in si,u hybridan with a probe derived from a given marker 

* NO . * ,« and 472 «„ 49! or a sequence comp.emen.ary .hereto. We method 
— g .abeled hybridisation probe „ ith a sa mpl e of a given type of tissue ftom £Z 
potenuaUy neoplasia and breas, caneer in par.icu.ar as we,, as norma, I « 

patron. ,o a degree srgmfrcanuy different (,g.. by at least . ^ Qf Qr * 

r,i a ^r fao,or of w ^ m * - * - - ^ - - — «° - — 

Polypeptide deterJinn 

B- subject invents*, tW provides a method of determining whether . ^ 
*om a sub.ec, possesses an abnom*. amoun, of marker po lyp ep.ide which compel 
ob a can samp.e from me subject, W quanumuvery determining to amouM „ f ^ 
po ypeptrde . the s^.e so obtained, and (e) comparing to of the marker po.ypep.idto 

^rmrned ^ a known s.ndard, so as » hereby define whether the ceU Z^^Z 

rtpepudes may.be detected by immunohistochemica, assays, do.-bio. assays, EUSA and the 



Antibodies 



Any type of antibody known in the art can be generated ,o bind specificaUy * an epitope of a 
,*REAST CANCER GENE" po.ypep.ide. An antibody as used herein J*. fatac ^ 

.7" " We " M — « -* - F*. PCabK and Fv, which are ca^Z, 

brndmg an eprtope of a BREAST CANCER GENE" porypeptide. Typiea,.y, a. ,eas« 6, , ,0 or 
12 eonuguous amino acids are required to form an epi.ope. However, ep it opes which invoke „«. 
conhguous annuo acids may require more, e.g., a. leas, 15, 25, or 50 amino acids. 

An antibody which specifically binds to an epitope of a J5REAST CANCER GENE" p „ )yDeDtide 
can b« ,herapeu„ca,,y, as we!, as in immunochemica, assays, such as Western blots mLs 
rad.otmmunoaas.ys, immunohistechemical assays, rrnmunopmcipiunons, or olher 'immuno- 
chenucs! assays known in ,„e art. Various immunoassays can he used ,o identify mtlbam . havjng 
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are we., known « the art. Such immunoassays typically ^ m 

m a, " ib0,ly Wh ' Ch SP£Cifi0! " 1) ' btads - » "BREAST CANCER GENE" po.voe , 
provrdes a detection signs, a, ,east 5, , 0-, or ao-fo.d h ig her ^ . detechon 

other proton, when used in en immunochenuoa. assay Preferably anuw J T 
^«o3RE AS TCANCE R OE^..p. lypepndB ^l^^'^ d ' eS ** ^ 

TZ CANCER GENE " " Wd - — • — • such as a mous 

rat, rabbu, guurea plg , or human, * p^noe po.yo.ona. antibodies If das T 

.3REAST CANCER GENE" po.ypept.de ean be conjugated to a carrier ^ 1^ 
serum albumta, thyrog.obu.ta, and keyho.e limp e, b-noo^ta. Depend! on STf T 

are no, , imite d to , Preund, adjuvant, minem, g e,s ( e. g „ ahnntaum hydroxide) 
^stances (e. g . lyS o,ecithta, p.uromc P o Iy o.s, p*^ ^ oil 

andCorynebaqtenumparvumareespeciaUyuseni.. 

Monoc,ona. antibodies which spacfflo^y btad to a .3REAST CANCER GENE" po.ypepnde ca 
be prepared using any technique which provides for the paction of antibody" 
onunuous ce„ tares in cutane. These techniques tacmde. bu, are not United to rh^L 

a... .985, (65), Koabor e, a,., .985, (66); Cote e, a... .983, (67) and Co.e et „., l9g4 , 

ta addition, techniques deveioped for the production of chimeric antibodies the soHcta, „f 

anhbody g enea ta human antibody genes , 0 obtain a mo,ecu.e with appr^il K 

- aCty, can be used ^ e, a,., ,98, 2^-^^ 

I" IM5 : C71)J - M — «* — — - can he hiatal Z ' 
pafent from mountmg an immune response asatast the antibody when it is uaed * 

«*rap or may require aheration of a few key residues. Sequence differences between 2e 
-bofces era, human sequences car, be minimised by rep,aci„ g residues which differ Z^Z 
" * ^ — " y **«-«-»J-». of tadividua, residues or by grating 1 
comptamentanty deterging regions. AM-**, humane antibodies can be^Ieo! 
recombman, methods, as described ta GB 2l 8863 8 B. Antibodies which speci Ja„y7ta d 7 
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.3REAST CANCER GENE" polypeptide can contain antigen binding sites which are either 
partially or folly humanized, as disclosed in U.S. Patent 5,565,332. 

Alternatively, techniques described for the production of single chain antibodies can be adapted 
using methods known in the art to produce single chain antibodies which specifically bind to 
„BREAST CANCER GENE" polypeptides. Antibodies with related specificity, but of distinct 
idiotypic composition, can be generated by chain shuffling from random combinatorial 
immunoglobulin libraries [Burton, 1991, (72)]. 

Single-chain antibodies also can be constructed using a DNA amplification method, such as PGR 
usmg hybridoma cDNA as a template [Thirion et al., 1996, (73)]. Single-chain- antibodies can be 
mono- or bispecific, and can be bivalent or tetravalent. Construction of tetravalent, bispecific 
smgle-chain antibodies is taught, for example, in Coloma & Morrison, (74). Construction of 
bivalent, bispecific single-chain antibodies is taught in Mallender & Voss, (75). 

A nucleotide sequence encoding a single-chain antibody can be constructed using manual' or 
automated nucleotide synthesis, cloned into an expression construct using standard recombinant 
DNA methods, and introduced into a cell to express the coding sequence, as described below 
Alternatively, single-chain antibodies can be produced directly using, for example, filamentous 
phage technology [Verhaar et al,, 1995, (76); Nicholls et al., 1993, (77)]. 

Antibodies which specifically bind to JBREAST CANCER GENE" polypeptides also can be 
produced by inducing in vivo production in the lymphocyte population or by screening 
•mmunoglobulin libraries or panels of highly specific binding reagents as disclosed in the literature 
[Orlandi et al., 1989, (789) and Winter et al., 1991, (79)]. 

Other types of antibodies can be constructed and used therapeutically in methods of the invention 
For example, chimeric antibodies can be constructed as disclosed in WO 93/03151. Binding 
proteins which are derived from immunoglobulins and which are multivalent and multispecific 
such as the antibodies described in WO 94/13804, also can be prepared. ' 

Antibodies according to the invention can be purified by methods well known in the art For 
example, antibodies can be affinity purified by passage over a column to which a JBREAST 
CANCER GENE" polypeptide is bound. The bound antibodies can then be eluted from the column 
using a buffer with a high salt concentration. 

Immunoassays are commonly used to quantify the levels of proteins in cell samples, and many 
other immunoassay techniques are known in the art. The invention is not limited to a particular 
assay procedure, and therefore is intended to include both homogeneous and heterogeneous 
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procedure. Exemplary unmunoassays which ^ h 

fluorescence portion immunoassay ^ fluo „ ^ 

tmmunoassay (EIA), nephelometric inhibition immunoassay (MA) ^ ,, ' ^ 
assav (hkai „j- • ™»ssay unja;, enzyme linked immunosorbei 

essay (ELISA), and rad.o.mmunoaasay (RIA). An indicator moiety, or label group can be anache 
.o me subject bodies and is selected so as ,o meet me needs or variol ^ 
whrch are oflen dictated by me ava„abi,ity of assay equipment and compatiWe 
procedures. Genera, techniques to he ^ h fte 
are known to those of ordinary skill in the art. 

Other methods to quantify the level of a particular protein, or a protein fragment or modifie 
protem m a particmar samp,e are based on flow-cymmetric methods. F ,ow cZreny aiwt 
tdentiflcation of proteins on the ee„ surface as wel, as of mtracelhdar protemsl^rtTom 
* cle , protem speeific antibodies or „on-,abe,ed antibodies in combination with ll ^m 
labeled secondary antibodies. Oenera, techniques ,o be used in performing flow cyton^Z 
noted above me known ,„ those of ordtaary ski,, in the art. A special method based on^eTam 
pnncples ,s the microsphere-based flow cytometric. Microsphere beads are .abeled Z^JT 

Inc. WO 97/14028. In am,ther embodiment the level of a parricu.ar protein or a protein framnen, 
or mourned protein in a particular san^le may be determined by 2D gel-e, JIT T 
mass specimen, D o,erminado„ of protem nature, sequence, JL^^^ 
be achteved m one detecHon step. Mass spectrometry can be performed with memol! 
mose with skiUs in me art as MALDI, TOF, or combination of Lse ™ ' 

In anodter embodiment me ,evel of the encoded product, i.e., me product encoded by any of m, 
polynucleotide sequences of the SEQ ID NO : 1 ,o i« and 472 ,„ 49, or a 
complementary .Crete. „ a bio.ogical fluid ( c,.. blood or mine, of a pahem Z y I «ZZ 
« way of monitoring the level of expression of the marker po.ynuc.eodde seqleettZ 
that patient. Such a method „ oul d include Urn steps of obteining a samp,e of a bio Jc flu 
fern dm pa.cn, wing fte samp, (or proteins from me sampte) widr an mtdbody " Z 
a encoded marker po.ypepnde. and determining the amount of immune complex formaCby 1 
antibody, w„h me amount of imnlune complex formadon being indicadve of the te ^ 

~to the amotm, of immnne complex formadon by the same andbody in a con.ro, san^ 
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k another embodiment, the metho<i can be used t „ determine tbe ^ „ 
present ,n a ce„, which in turn can be corrdated with progression of me disMder 
formahon The ,eve, „ f the marker poIypeptide . „ e ^ ^ 

cel ls . The observahon of marker po.ypeptid. ,eve> can he uhUzed in decisions regarding eg ,he 
useofmorestrineenttherani^ S "mg, e.g., the 



AS a., on, above, one .spec, of .he present invention relate to diagnostic assays for determining 
tn the ennte* of ce„s isolated from a patient, if ,he ,eve, of a marker polypeptide is si g„Z ,y 
r^ ,n the samp.e ce„, The temr ., ignifi can.,v reduced" refers to a ceU pheno JwhT 
the el, possesses a rednced cemuar amount of the .narker poiypeptide re,aHv. to a nol, ce„ f 
sunder ussue ongin. For examp,e. a ceU may have ,ess than about 50%, 25%. ,0%, or 5% of the 
marker potypepbde that a nonua, contro, ceU. h, particular, the assay evamate the ,eve, of JZ 

' V ttst "* comparK "» ~ - — « 



15 phenotype. 



Of parucuUr important* to the subject invention is the Mity to 

PO ypepude as determined by the number of ceus associated wirh a norma, or abnonal 1^ 
potypeptu* ieveh The number of cc„s with a partictdar marker pdypeptide phenotypTmTZ 
be annate wuhpuuen, prognosis. „ oue embodiment of me invention, the njTpmyp^ 
henorype o the teion is determined as a percentage of ce lls in a biopsy which are foLTh^ 
abnormany brgb/,ow ,evels of the marker pCypepnd, Such expression may be deteel by 
tmmunomstocbemical assays, dot-blot assays, ELBA and the like. 



Immunohistochemixtry , 



Where hssue samples are emp,oyed, immunohistochemical staining may be used to determine the 

«ken frmn the btopsy or outer hssue samp,e and subjected m proteose hydrops, emptying 
such ag^ts as protease K or pepsin. ta certain embodiment, it may be desirab,e to LIZ 

TZ ^ " "" Pb 06115 - 4 *~ - " Vel * — " «" -ear 

The tissues samples are feed by treatment with a reagent such as fonnalin, glutaraldehyde 

a LTrr K indins for 11,0 Mtar P * W " d ~ ™° — * ^ » 

Label for subsequent detecHon of binding, samples are incubated for a time Sufficient for 
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formation of the taununocomplexes. Binding of the antibody is to detected by virtue of a Lab 
conjugated ,o this antibody. Where the antibody is unfiled, a second labeled antibody may 
employed, ,g which is specific for the iso^pe of the anti-marker polypeptide antibody. Exampl. 
of labels whch may be emp.oyed inctode radionuchdes, fluorescence, chemoluminescence a. 



Where enzymes are employed, the Substrate for the enzyme may be added to the samples , 
prov.de a c ol ored or fluorescent product . Ba ^ m of ^ ^ ^ ^ 

mdude horseradish peroxidase, alkahne phosphate, malate dehydrogenase and the hke Who 
no, commerciaUy available, such antibody-e^yme conjugates are readily produced by techniqu, 
known to those skilled in the art. q 

m one embodiment, the assay is performed as a dor b,o, assay. The dotblo. assay finds particuh 
apphcatron where tissue samp.es are employed as i, aUows determination of the average amount < 
the marker polypeptide associated with a Single eel> by collating the amount of mark. 
polypept.de in a cell-flee extract produced from a predetennined number of cells. 

In yet another embodiment, the invention contemplates using a panel of antibodies which a. 
generated against the marker polypeptides o, this invention, which Peptides are encoded b 
any of the polynucleotide sequences of the SEQ ID NO: 1 to 165 and 472 to 491. Such a panel c 
antibodies may be used as a reliabte diagnostic probe for breast cancer. The aasay of the prese. 
mvendon comprises counting a biopsy sample containing cells, e.g,, macrophages, with a pan. 
of antibod.es ,o on. or more of the encoded products ,„ determine the pmsence or absence of tb 
marker polypeptides. 

The diagnostic methods of the subject invention may also be employed as follow-up to treatmen 
e.g.. quantification of the level of marker polypeptides may be indicative of the effectiveness c 
current or previously employed therapies for malignant neoplasia and breast cancer in particular a 
well as the effect of these therapies upon patient prognosis. 

The diagnostic assays described above can be adapted to be used as prognostic assays, as wel 
Such an application takes advantage of the sensitivity of the assays of the fcvention to even, 
wh.ch take place at characteristic stages in the progression of plaque generation in case c 
mahgnant neoplasia. For example, a given marker gene may be up- or down-regulated at a ver 
early stage, perhaps before the cell is developing into a foam cell, while another marker gene ma 
be characteristically up or down regulated only at a much later stage. Such a method could involv 
the steps of contacting the mRNA of a test cell with a polynucleotide probe derived from a give 
marker polynucleotide which is expressed at different characteristic levels in breast cancer tissu 
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cells a, different stages of maligna* „eop,asi a ^ ^ 

*. level of expression o f the gene in , he and (hus an indicanon of the s.age ^ 
progress™ of the ceU; abemativety. the assay ean be carrier, ou. with an antibody specif 
gene pro ne, o ,he given marker polynocleonde, c„„, act ed with Ure proteins of ,1 *VZ A 
oaner, of snon rears w,„ disCose. no. o», y th . stance of a ceruln neopUsbc ,ealon. b„, a J„m 

the likelAoodofsncoesa of that treatment. • ■"*" op^ea,c, 

The memods of the invention can a.ao he nsed to foUow the c.inica, course of a given hreas, 
cancer predtspostnon. For examp.e. the aasay of the mvention can bo appHod <o a mods 

taken and me ,es, repeated. Success*, ^earnten, ^ resull in „ „ / ^ 



!5 Polypepti de activity 



L. one embodiment the present invention provides a method for screening potenualiv the™ „■ 
agents which modutate me aenvity of one o, more "BREAST CANCER G^T ^ 

zz'zzxr po, ~ is — - ■ - ™- ~ - 

BREAST CANCER GENE" i„ a subject having or a, riak for mabgnan. neop.asia and breaa, 
cancer - pamonlar, me merapeuHc substance wU, decrease me activity of .be po .pop.ide re Z 

m brcaa, cancer m parnomar bn, no. treated with the mempen«o agen, Likewise, i, me acti^Tf 
me po.ypepnde aa a ream, of the downreguUnon of me "BREAST CANCER GENE" ia deJLd 
m a^ec. baving or a, Hak for mabgnan, neopIasia or breM , _ fa 
«. w ■ mcrense me acuvi* of thepo,ypephde relabve ,o mo aoUvitv of me same poivpepL 
a anb.eo, no, bavtng or no. at risk for mabgnan, neopUsia or breaa, oancer in pari" bu no^ 
treated with the therapeutic agent. ™ CUI ar, but not 

Tbe aenvi* of me "BREAST CANCER GENE" polypeptide, indicated in Tab , e 2 , , 
measured by any means known m those of ski,, in the a „ »„ which are pmicu]ar for ^ 
envtrv performed by me P—o— po.ypepbde. Examp,ea of specific ssaaya which may b^, 
.o measum the acuta* of parucubtr polynndeoddes are ahown below. I 
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a) G protein coupled receptors 

t' 

In one embodiment, the "BREAST CANCER GENE" polynucleotide may encode a G protei, 
coupled receptor. In one embodiment, the present invention provides a method of screenin, 
potential modulators (inhibitors or activators) of the G protein coupled receptor .by measuring 
5 changes in the activity of the receptor in the presence of a candidate modulator. 

O Gj -coupled receptors 

Cells (such as CHO cells or primary cells) are stably transfected with the relevant receptor an. 
with an inducible CRE-luciferase construct. Cells are grown in 50% Dulbecco's modified Eagl, 
medium / 50% F12 (DMEM/F12) supplemented with 10% FBS, at 37°C in a humidifie. 
atmosphere with 10% CO* and are routinely split at a ratio of 1:10 every 2 or 3 days. Test culture; 
are seeded into 384 - well plates at an appropriate density (e.g. 2000 cells / well in 35 ul eel 
culture medium) in DMEM/F12 with FBS, and are grown for 48 hours (range: ~ 24 - 60 hours 
depending on cell line). Growth medium is then exchanged against serum free medium (SFM; e.g 
Ultra-CHO), containing 0,1% BSA. Test compounds dissolved in DMSO are diluted in SFM ant 
15 transferred to the test cultures (maximal final concentration 10 umolar), followed by addition o: 
forskolin (~ 1 umolar, final cone.) in SFM + 0,1% BSA 10 minutes later. In case of antagonis 
screening both, an appropriate concentration of agonist, and forskolin are added. The plates an 
incubated at 37°C in 10% CO, for 3 hours. Then the supernatant is removed, cells are lysed witl 
lysis reagent (25 mmolar phosphate-buffer, pH 7,8, containing 2 mmolar DDT, 10% glycerol anc 
3% Triton X100). The luciferase reaction is started by addition of substrate-buffer (e.g. luciferasc 
assay reagent, Promega) and luminescence is immediately determined (e.g. Berthold luminomete, 
or Hamamatzu camera system). 

2) G. -coupled receptors 

Cells (such as CHO cells or primary cells) are stably transfected with the relevant receptor anc 
25 with an inducible CRE-luciferase construct. Cells are grown in 50% Dulbecco's modified Eagle 
medium / 50% F12 (DMEM/F12) supplemented with 10% FBS, at 37°C in a humidifiec 
atmosphere with 10% CO, and are routinely split at a ratio of 1:10 every 2 or 3 days. Test culture! 
are seeded into 384 - well plates at an appropriate density (e.g. 1000 or 2000 cells / well in 35 u! 
cell culture medium) in DMEM/F12 with FBS, and are grown for 48 hours (range: - 24 - 60 hours 
30 depending on cell line). The assay is started by addition of test-compounds in. serum free mediuir 
(SFM; e.g. Ultra-CHO) containing 0,1% BSA: Test compounds are dissolved in DMSO, diluted ir 
SFM and transferred to the test cultures (maximal final concentration 10 umolar, DMSO cone. < 
0,6 %). In case of antagonist screening an appropriate concentration of agonist is added 5 - IC 
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uunutes later. The plates are incubated at 37'C in 10% C0 2 for 3 hours. Then the cells are lysed 
wrth 10 ul lysis reagent per well (25 mmolar phosphate-buffer, P H 7,8 , containing 2 mmolar DDT 
10* glycerol and 3% Triton X100) and the luciferase reaction is started by addition of 20 pi 
substrate-buffer per well (e.g. luciferase assay reagent, Promega). Measurement of luminescence is 
started immediately (e.g. Berthold luminometer or Hamamatzu camera system). 

3) G 3 -cou pled recep tors 

Cells (such as CHO cells or primary cells) are stably transfected with the relevant receptor Cells 
expressing functional receptor protein are grown in 50<>/o Dulbecco's modified Eagle medium / 
50o/o F12 (DMEM/F12) supplemented with 10% FBS, *37-C in a humidified atmosphere with 
5% CC, and are routinely split at a cell line dependent ratio every 3 or 4 days. Test cultures are 
seeded mto 384 - well plates at an appropriate density (e.g. 2000 cells / well in 35 pi cell culture 
mediun.) in DMEM/F12 with FBS, and are grown for 48 hours (range: -24-60 hours, depending 
on cell hue). Growth medium is then exchanged against physiological salt solution (e.g. Tyrode 
soluuon). Test compounds dissolved in DMSO are diluted in Tyrode solution containing 0 1% 
BSA and transferred to the test cultures (maximal final concentration 10 pmolar). After addition of 
the receptor specific agonist the resulting Gq-mediated intracellular calcium increase is measured 
using appropriate read-out systems (e.g. calcium-sensitive dyes). 

b) Io n channels 

Ion channels are integral membrane proteins involved in electrical signaling, transmembrane signal 
transduction, and electrolyte and solute transport By forming macromolecular pores through the 
membrane lipid bilayer, ^on channels account for the flow of specific ion species driven by the 
e ectrochemical potential gradient for the permeating ion. At the single molecule level, individual 
channels undergo conformational transitions ("gating") between the 'open' (ion conducting) and 
'closed' (non conducting) state. Typical single channel openings last for a few milliseconds and 
result « elementary transmembrane currents in the range of 10* - 10 -» Ampere. Channel gating is 
controlled by various chemical and/or biophysical parameters, such as neurotransmitters and 
mtracellular second messengers Cligand-gated' channels) or membrane potential (Voltage-gated' 
channels). Ion channels are functionally characterized by their ion selectivity, gating properties 
and regulation by hormones and pharmacological agents. Because of their central role in signaling 
and transport processes, ion channels present ideal targets for pharmacological therapeutics in 
various pathophysiological settings. 

In one embodiment, the "BREAST CANCER GENE" may encode an ion channel In one 
embodunent, the present invention provides a method of screening potential activators or 
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inhibitors : of channels activi* of tte "BREAST CANCER GENE" polypeptide Screen 
co„ wia , io „ „ to ej(iKr jn]]ibit or 

«0 b,ndu,g and (2.) functional aasaya in Uving ceusfHiUe (l ,2,]. * *"* 

.. "gaud-gated channels, e.g. ionotropic nanrotranamte/ho^ono receptors assaya o 

2 ' e^l^e M °" ^ ta HVing °*- *~ - 

-pressed endogenous ,„ appropriate reporter c.Hs or are introduced recondrinan.1 

Channel achvtty can be monitor by (2..) concentration changea of the oemT, 
(ntoa, protninendy Ca* ions), ( ,2) by changea in the tran^ ^™ 

a~ <2 - 3, : y ~* a °° i,uta — <«* — -.s^ 

seorettonofaneuronnnanntter) trigger^ or modulated by the terge, .cdvfty. * 

2 " 2i r? rcsu,,s in *— — - ^ ~, 0 f io „ 

hannds can be rnonitoted by the rearing changea in inu.cCu.ar ion co, 
centra..™ uaing .untineacen. or fluorescent ^ £ " 

dynannc range and avaflabHity of suitable indicatora dds appflea particular* , 

T" ,u,ar ion -™- <(C ^ ^ - - — 

toough the ^ ctame , iKelfis dtatiy ot ^ A" 

eha-ute, affects m etnbra„e potenda! and dtereby the.acuvUy of co-e^ 
voltage-gated Ca 2+ channels. °-expresse 

2.2 Ion channel currents result in changes of electrical m » m u 

. ^soieiectncal membrane potential (V m ) whic 

::;~ red d direoay ^ — — — ^ 

cally charged indicators (e.g. the anionic oxonol dye DiBAP nn ,•• 

^.---hr^arc^^ir:^^ 
e-nuld™ dU^bnrion ia governed by the Nerr^uadon. L ^ 
■uembrane potential resulta in concotnitan. changea in ce„u,ar fluorescence L„ 
etansea .„ v„ might be cauaed direcfly bv aotivify of ^ ^ ^ 

srr" Md/or ^ ° f 8,6 *- * *— 

2-3 Target channel activity can cause ce.hdar Ca» entry either direcfly or throng, 
acvahon of addidona, Ca» channe, (see 2,). The re Sutting mtra L lular ^ 
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signals regulate a variety of cellular responses, e.g. secretion or gene transcription. 
Therefore modulation of the target channel can be detected by monitoring 
secretion of a known hormone/transmitter from the target-expressing cell or 
through expression of a reporter gene (e.g. luciferase) controlled by an Ca 2t - 
5 responsive promoter element (e.g. cyclic AMP/ Ca 2 *-responsive elements; CRE). 

• c) DNA-binding proteins and transcription factors 

In one embodiment,, the "BREAST CANCER GENE" may encode a DNA-binding protein or a 
transcription factor. The activity of such a DNA-binding protein or a transcription factor may be 
. measured, for example, by a promoter assay which measures the ability of the DNA-binding 
10 protein or the transcription factor to initiate transcription of a test sequence linked to a particular 
promoter. In one embodiment, the present invention provides a method of screening test 
compounds for its ability to modulate the activity of such a DNA-binding protein or a transcription 
factor by measuring the changes in the expression of a test gene which is regulated by a promoter 
which is responsive to the transcription factor. 

1 5 Promotor assays 

A promoter assay was set up with a human hepatocellular carcinoma cell HepG2 that was stably 
transfected with a luciferase gene under the control of a gene of interest (e.g. thyroid hormone) 
regulated promoter. The vector 2xIR01uc, which was used for transfection, carries a thyroid 
hormone responsive element (TRE) of two 12 bp inverted palindromes separated by an 8 bp spacer 
20 in front of a tk minimal promoter and the luciferase gene. Test cultures were seeded in 96 well 
plates in serum - free Eagle's Minimal Essential Medium supplemented with glutamine, tricine 
sodium pyruvate, non - essential amino acids, insulin, selen, transferrin, and were cultivated in a 
humidified atmosphere at 10 » /o C Q 2 at 37»C. After 48 hours of incubation serial dilutions of test 
compounds or reference compounds (L-T3, L-T4 e.g.) and co-stimulator if appropriate (final 
concentration 1 nM) were added to the cell cultures and incubation was continued for the optimal 
time (e.g. another 4-72 hours). The cells were then lysed by addition of buffer containing Triton 
XlOO and luciferin and the luminescence of luciferase induced by T3 or other compounds was 
measured in a luminometer. For each concentration of a test compound replicates of 4 were tested. 
EC 50 - values for each test compound were calculated by use of the Graph Pad Prism Scientific 
30 software. 
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Screening Methods 

The invention provides assays for screening test compounds which bind to or modulate the activit 
of a „BREAST CANCER GENE" polypeptide or a ,JBREAST CANCER GENE" polynucleotide 
A test compound preferably binds to a ,3REAST CANCER GENE" polypeptide or pol 3 
nucleotide. More preferably, a test compound decreases or increases .3REAST CANCER GENE 
activity by at least about 10, preferably about 50, more preferably about 75, 90, or 100% relative t 
the absence of the test compound. 

Test Compounds 

Test compounds can be pharmacological agents already known in the art or can be compound 
previously unknown to have any pharmacological activity. The compounds can be naturall 
occurring or designed in the laboratory. They can be isolated from microorganisms, animals, c 
plants, and can be produced recombinant, or synthesised by chemical methods known in the art. ] 
desired, test compounds can be obtained using any of the numerous combinatorial library method 
known in the art, including but not limited to, biological libraries, spatially addressable paralh 
solid phase or solution phase libraries, synthetic library methods requiring deconvolution, the om 
bead one-compound library method, and synthetic library methods using affinity chromatograph 
selection. The biological library approach is limited to polypeptide libraries, while the other fou 
approaches are applicable to polypeptide, non-peptide oligomer, or small molecule libraries c 
compounds. [For review see Lam, 1997, (80)]. 

20 Methods for the synthesis of molecular libraries are well known in the art [see, for examph 
DeWitt et al., 1993, (81); Erb et al., 1994, (82); Zuckermann et al., 1994, (83); Cho et aL, 1992 
(84); Carell et al., 1994/ (85) and Gallop et al., 1994, (86). Libraries of compounds can b 
presented in solution [see, e.g., Houghten, 1992, (87)], or on beads [Lam, 1991, (88)], DNA-chip 
[Fodor, 1993, (89)], bacteria or spores (Ladner, U.S. Patent 5,223,409), plasmids [Cull et al., 199: 
(901)], or phage [Scott & Smith, 1990, (91); Devlin, 1990, (92); Cwirla et al., 1990, (93); Felic 
1991,(94)]. 

High Throughput Screenin g 

Test compounds can be screened for the ability to bind to .3REAST CANCER GENE 
polypeptides or polynucleotides or to affect .3REAST CANCER GENE" activity or ,3REAS 
CANCER GENE" expression using high throughput screening. Using high throughput screeninj 
many discrete compounds can be tested in parallel so that large numbers of test compounds can t 
quickly screened. The most widely established techniques utilize 96-well, 384-well or 1536 
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microtiter plates. The wells of the microtiter plates typically require assay volumes that range from 
5 to 500 ul. m addition to the plates, many instruments, materials, pipettors, robotics, plate 
washers, and plate readers are commercially available to fit the microwell formats. 

Alternatively, free format assays, or assays that have no physical barrier between samples, can be 
used. For example, an assay using pigment cells (melanocytes) in a simple homogeneous assay for 
combinatorial peptide libraries is described by Jayawickreme et al., (95). The cells are placed 
under agarose in culture dishes, then beads that carry combinatorial compounds are placed on the 
surface of the agarose. The combinatorial compounds are partially released the compounds from 
the beads. Active compounds can be visualised as dark pigment areas because, as the compounds 
diffuse locally into the gel matrix, the active compounds cause the cells to change colors. 

Another example of a free format assay is described by Chelsky, (96). Chelsky placed a simple 
homogenous enzyme assay for carbonic anhydrase inside an agarose gel such that the enzyme in 
the gel would cause a color change throughout the gel. Thereafter, beads carrying combinatorial 
compounds via a photolinker were placed inside the gel and the compounds were partially released 
by UV light. Compounds that inhibited the enzyme were observed as local zones of inhibition 
having less color change. 

In another example, combinatorial libraries were screened for compounds that had cytotoxic 
effects on cancer cells growing in agar [Salmon et al., 1 996, (97)]. 

Another high throughput screening method is described in Beutel et al., U.S. Patent 5,976 813 In 
this method, test samples are placed in a porous matrix. One or more assay components Ire then 
placed within, on top of, or at the bottom of a matrix such as a gel, a plastic sheet, a filter, or other 
form of easily manipulated solid support. When samples are introduced to the porous matrix they 
diffuse sufficiently slowly, such that the assays can be performed without the test samples running 
together. 

Bindin g Assays 

For binding assays, the test compound is preferably a small molecule which binds to and occupies 
for example, the ATP/GTP binding site of the enzyme or the active site of a .3REAST CANCER 
GENE" polypeptide, such that normal biological activity is prevented. Examples of such small 
molecules include, but are not limited to, small peptides or peptide-like molecules. 

In binding assays, either the test compound or a .3REAST CANCER GENE" polypeptide can 
comprise a detectable label, such as a fluorescent, radioisotopic, chemiluminescent, or enzymatic 
label, such as horseradish peroxidase, alkaline phosphatase, or luciferase. Detection of a test 
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compound which is bound ,„ a .3REAST CANCER GENE" polypeptide coo th. s 

occo^hed, fcr _ countfag of ^ ^ by : ~ c ;-; - 

detenmnmg convemon of an appropriate substrate to a detectablo product 

Altemanvdy, binding of a tea, compound to a .3EEAST CANCER GENE" po.ypep.ide can c 
de_ without ,abe, tag ehher of the Scants. Eor example, a nficrophyTonL el 
used to detect binding of a.tes, compound wi,h . „BREAST CANCER GENE poiype^X 

Cyteaensor,) is an a „ alytical ,„„ ^ P ' 

oo» acidifies its environment ^ , , i6ht . ad<lressab , e ^ * 

Uns acidification rate can be used as an indicater of the interaction between a test compound 
■3REAST CANCER GENE" polypepfide CMcConncll et a!., !992, (98)]. 

Defining me abifity of a tea, confound to bind .o a breast cancer oene „ 

..can be acoomplished using a technology such as reaUime Bimotecular fiKemcfil 

(B*) blander * Urbanic^. , W1 . (99) , and S2abo „ „ ^ J 

BIAcore ). Changes in me opfica! phenomenon surface piasmon resonance (SPR) oan be J d a 
an mdrcauon of reaMme reactions between biological molecules. > «» *»*ed * 

h. ye, another aspec, of fte mveolioil , . CANCER GENE" p„ ly pep, id e can be used as 

o!;* T; - M ' (,02): ^ * "- '" 3 - < 10M * ». a, 199< 

(.04, and Bren« WO 94/.0300), te idenfify o*er proteins which bind to or interne, with 1" 
•3REAST CANCER GENE" potypepnde and mediate i«s activity. 

oT ntTr " ^ tojn0dUtar ° f ^ ^ »»* cons, 

of s^rabie DNA-binding and aoovaHon domains. Briefly, me assay urni^ «wo different 2 

constructs. For examp.e, in one construct, polynucleotide encoding a breast CAN CER GENE 
Pdypeptide can be firsed <„ a pdynuOeonde encoding me DNA binding domain of a JZ 
tmnsenphon facter (e,.. GAL4). m me other construe, a DNA se^Lce ma, ^ 
unified protein ( W or W ) can be (Used te a po.ynudeofide ,ha, codes fiT* 

rrr, of "* kno,vn tanscripuon &c, ° r - * - ■»* md *• - *. 

interac, ,„ vivo te form an protein- dependen, comptex, me DNA-binding and acfivafio 
*-« of .he .tanacripuon frcter are brough, in* dose proximn, This proximity Z 
-ansenphon of , reporter ge ne (e.g.. LacZ>, which is operab,y .inxed ,o a bansIipfiZ 
mgulatery s.,e responsive «o me 0n„scripU M too, Expression o, me reporter gene can I 
detected, and ceil coionies confining the fimcuonal .ransenption facte, can ^ is0|attd ^ 
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ft may be desirable .o immobilize either a „BREAST CANCER GENE" m , ... , 

^ — to , ::r : r:r:r Ltrrrr 

support, ,„c,ud,ng use of covatat and ^ J 

bmdrng mo,e*« arched respectively «o «be po ly pep ti de (or polynudeonde) orLtl 7 „ 

the solid supper, Tea, compounds are pre f erab ly bound to the £ ^T^~ - 

CANCER GENE po.ypephde (or poiyouoleoode) can be accented in any vessel suUabie f 
conrauung me reacra.ua. E^nples of sueb vessels include nucrotiter *£Z JZT d 
microcentrifuge tubes. P ' St tubes ' 311(1 

CANCER GENE polypept.de; the mixture „ men iricubatei ^ J» [ ST 

formadon ar phyaiologica, conditions f „r saft and pH, t^Z^ Z 
beads or _r p,a,e we lls are washed ,o remove any unbound components Bte d " If me 
ntteracants can be determined either direcdy or mdirecdy, as described above. Aften^,^ 
compos can be dissociated from the soHd support before binding „ de^ * *" 

Odter technic for imrnobihsing proteins or polynucleotides on a solid support a,so can be used 
- *e screemng assays of the invention. For example, either a ,3REAST CANCER GEN^ 
po ypept.de (or po.ynue.eoade) or a ,es, compound can be immobUfced uniting eon^Tof 
htohn and sueptavidh, Biotinylated ..BREAST CANCER GENE" p!,ype Z " 
poiyrtucieoudea) or test compounds can be prepared from biotin NHS (N-hyLTa^ 
«. «- ,ues w.„ _ in „ „ ^ bioHnyUtjon Mt _ ^ ^ 

and unmob. llM d in the we„s of sheptavidimeoated 9S well pla.es (Pierce Chemical 
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Altematively, antibodies which specifically bind to a .3REAST CANCER GENE" polypeptide 
polynucleotide, or a test compound, but which do not interfere with a desired binding site, such a 
the ATP/GTP binding site or the active site of the „BREAST CANCER GENE" polypeptide ca, 
be derivatised to the wells of the plate. Unbound target or protein can b. trapped in the well's * 
antibody conjugation. 

Methods for detecting such complexes, in addition to those described above for the GST 
immobilized complexes, include immunodetection of complexes using antibodies whicl 
specifically bind to a ..BREAST CANCER GENE" polypeptide or test compound, enzyme-linke, 
assays whxch rely on detecting an activity of a ..BREAST CANCER GENE" polypeptide, and SD{ 
gel electrophoresis under non-reducing conditions. 

Screening for test compounds which bind to a .3REAST CANCER GENE" polypeptide o 
polynucleotide also can be carried out in an intact cell. Any cell which comprises a .BREAST 
CANCER GENE" polypeptide or polynucleotide can be used in a cell-based assay system / 
J3REAST CANCER GENE" polynucleotide can be naturally occurring in the cell or can b, 
mtroduced using techniques such as those described above. Binding of the test compound to • 
..BREAST CANCER GENE" polypeptide or polynucleotide is determined as described above. 
Modulation of Gene Expression 

In another embodiment, test compounds which increase or decrease J3REAST CANCER. GENE' 
expression are identified. A .3REAST CANCER GENE" polynucleotide is contacted with a tes 
compound in an approriate expression test system as described below or in a cell system, and th, 
expression of an RNA or polypeptide product of the ,3REAST CANCER GENE" polynucleotid, 
is determined. The level of expression of appropriate mRNA or polypeptide in the presence of th« 
test compound is compared to the level of expression of mRNA or polypeptide in the absence o 
the test compound. The test compound can then be identified as a modulator of expression base< 
on this comparison. For example, when expression of mRNA or polypeptide is greater in th, 
presence of the test compound than in its absence, the test compound is identified as a stimulate 
or enhancer of the mRNA or polypeptide expression. Alternatively, when expression of the mRW 
or polypeptide is less in the presence of the test compound than in its absence, the test compoun, 
is identified as an inhibitor of the mRNA or polypeptide expression. 

The level of ..BREAST CANCER GENE" mRNA or polypeptide expression in the cells can b, 
determined by methods well known in the art for detecting mRNA or polypeptide. Eithe 
qualitative or quantitative methods can be used. The presence of polypeptide products of , 
"BREAST CANC ER GENE" polynucleotide can be determined, for example, using a variety o 
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techniques known in the art, including ixnmunochemical methods such as radioimmunoassay 
Western blotting, and immunohistochemistry. Alternatively, polypeptide synthesis can be 
determined in vivo, in a cell culture, or in an in vitro translation system by detecting incorporation 
of labeled amino acids into a .3REAST CANCER GENE" polypeptide. 

Such screening can be carried out either in a cell-free assay system or in an intact cell. Any cell 
which expresses a ..BREAST CANCER GENE" polynucleotide can be used in a cell-based assay 
system. A ..BREAST CANCER GENE" polynucleotide can be naturally occurring in the cell or 
can be introduced using techniques such as those described above. Either a primary culture or an 
established cell line, such as CHO or human embryonic kidney 293 cells, can be used. 

Therapeu tic Indications and Methods 

Therapies for treatment of breast cancer primarily relied upon effective chemotherapeutic drugs 
for intervention on the cell proliferation, cell growth or angiogenesis. The advent of genomics- 
dnven molecular target identification has opened up the possibility of identifying new breast 
cancer-specific targets for therapeutic intervention that will provide safer, more effective 
treatments for malignant neoplasia patients and breast cancer patients in particular. Thus newly 
d 1S covered breast cancer-associated genes and their products can be used as tools to develop 
innovative therapies. The identification of the Her2/neu receptor kinase presents exciting new 
opportunities for treatment of a certain subset of tumor patients as described before. Genes playing 
important roles in any of the physiological processes outlined above can be characterized as breast 
cancer targets. Genes or gene fragments identified through genomics can readily be expressed in 
one or more heterologous expression systems to produce functional recombinant proteins These 
proteins are characterized in vitro for their biochemical properties and then used as tools in high- 
throughput molecular screening programs to identify chemical modulators of their biochemical 
activities. Modulators of target gene expression or protein activity can be identified in this manner 
and subsequently tested in cellular and in vivo disease models for therapeutic activity 
Optimization of lead compounds with iterative testing in biological models and detailed 
pharmacokinetic and toxicological analyses form the basis for drug development and subsequent 
testing in humans. 

This invention further pertains to the use of novel agents identified by the screening assays 
descnbed above. Accordingly, it is within the scope of this invention to use a test compound 
identified as described herein in an appropriate animal model. For example, an agent identified as 
described herein (e.g.. a modulating agent, an antisense polynucleotide molecule, a specific 
antibody, ribozyme, or a human ..BREAST CANCER GENE" polypeptide binding molecule) can 
be used in an animal model to determine the efficacy, toxicity, or side effects of treatment with 
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such an agent. Alternatively, an agent identified as described herein, can be used in an anima 
model to determine the mechanism of action of such an agent. Furthermore, this invention pertain! 
to uses of novel agents identified by the above described screening assays for treatments a, 
described herein. 

A reagent which affects human ,3REAST CANCER GENE" activity can be administered to { 
human cell, either in vitro or in vivo, to reduce or increase human „BREAST CANCER GENE' 
activity. The reagent preferably binds to an expression product of a human ,3REAST CANCEF 
GENE". If the expression product is a protein, the reagent is preferably an antibody. For treatmen 
of human cells ex vivo, an antibody can be added to a preparation of stem cells which have beer 
removed from the body. The cells can then be replaced in the same or another human body, with o, 
without clonal propagation, as is known in the art. 

In one embodiment, the reagent is delivered using a liposome. Preferably, the liposome is stable it 
the animal into which it has been administered for at least about 30 minutes, more preferably for a 
least about 1 hour, and even more preferably for at least about 24 hours. A liposome comprises i 
lipid composition that is capable of targeting a reagent, particularly a polynucleotide, to >< 
particular site in an animal, such as a human. Preferably, the lipid composition of the liposome is 
capable of targeting to a specific organ of an animal, such as the lung, liver, spleen, heart brain 
lymph nodes, and skin. 

A liposome useful in the present invention comprises a lipid composition that is capable of fusinj 
with the plasma membrane of the targeted cell to deliver its contents to the cell. Preferably, th< 
transfection efficiency of a liposome is about 0.5 ug of DNA per 16 nmol of liposome delivered tc 
about 10* cells, more preferably about 1.0 ug of DNA per 16 nmol of liposome delivered to abou 
10 6 cells, and even more preferably about 2.0 ug of DNA per 16 nmol of liposome delivered t< 
about 10 6 cells. Preferably, a liposome is between about 100 and 500 nm, more preferably betweer 
25 about 150 and 450 run, and even more preferably between about 200 and 400 nm in diameter. 

Suitable liposomes for use in the present invention include those liposomes usually used in, fo: 
. example, gene delivery methods known to those of skill in the art. More preferred liposome, 
include liposomes having a polycationic lipid composition and/or liposomes having a cholestero 
backbone conjugated to polyethylene glycol. Optionally, a liposome comprises a compounc 
30 capable of targeting the liposome to a particular cell type, such as a cell-specific ligand exposed oi 
the outer surface of the liposome. 

Complexing a liposome with a reagent such as an antisense oligonucleotide or ribozyme can b< 
achieved using methods which are standard in the art (see, for example, TJ.S. Patent 5,705,151) 
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Preferably, from about 0.1 ug to about 10 ug of polynucleotide is combined with about 8 nmol of 
hposomes, more preferably from about 0.5 ug to about 5 ug of polynucleotides are combined with 
about 8 nmol liposomes, and even more preferably about 1.0 ug of polynucleotides is combined 
with about 8 nmol liposomes. 

In another embodiment, antibodies can be delivered to specific tissues in vivo using receptor- 
mediated targeted delivery. Receptor-mediated DNA delivery techniques are taught in for 
example, Findeis et al., 1993, (105); Chiou et al., 1994, (106); Wu & Wu, 1988, (107); Wu et al 
1994, (108); Zenke et al., 1990, (109); Wu et al., 1991, (110). 

Determination of a Ther apeutically R ffective Dose 

The determination of a therapeutically effective dose is well within the capability of those skilled 
in the art. A therapeutically effective dose refers to that amount of active ingredient which 
increases or decreases human , 3 REAST CANCER GENE" activity relative to the human 
,3REAST CANCER GENE" activity which occurs in the absence of the therapeutically effective 
dose. 

For any . compound, the therapeutically effective dose can be estimated initially either in cell 
culture assays or in animal models, usually mice, rabbits, dogs, or pigs. The animal model also can 
be used to determine the appropriate concentration range and route of administration Such 
^formation can then be used to determine useful doses and routes for administration in humans. 

Therapeutic efficacy and toxicity, e.g. ED 50 (the dose therapeutically effective in 50% of the 
population) and LD S0 (the dose lethal to 50% of the population), can be determined .by standard 
pharmaceutical procedures in cell cultures or experimental animals. The dose ratio of toxic to 
therapeutic effects is the therapeutic index, and it can be expressed as the ratio, LD 50 ^ED S0 . 

Pharmaceutical compositions which exhibit large therapeutic indices are preferred The data 
obtained from cell culture assays and animal studies is used in formulating a range of dosage for 
human use. The dosage contained in such compositions is preferably within a range of circulating 
concentrations that include the ED 50 with little or no toxicity. The dosage varies within this range 
depending upon the dosage form employed, sensitivity of the patient, and the route of 
administration. 

The exact dosage will be determined by the practitioner, in light of factors related to the subject 
that requires treatment. Dosage and administration are adjusted to provide sufficient levels of the 
active ingredient or to maintain the desired effect. Factors which can be taken into account include 
the severity of the disease state, general health of the subject, age, weight, and gender of the 
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subject, die, nme and fluency of administration, ^ combination(s) , ^ 
tolerance/response to therapy Lon£-actm» t.1,,™ * , ' ^ 

3 to 4 days, every week, or once every two weeks depending on the half-life and clearance rate c 
the particular formulation. 

5 Nornud dosage an™* can vary i™, 0.1 „ ,00.000 microgran*. up to a tota, dose of about 1 . 
dependmg upon ; me route of adminiatradon. Guidance as to partner dosages and methods 

27- 7 ,, >itaMm Md m ™ a ' hl ° to — «— *- - «■ 

*,He ,n fce „ w,,, emp,oy different felons for nuCeoddes than f or prottin5 or thei 
rnhdntora SmuW.y, delivery of po ly nueleondes or pCypepddes wilt be specific to particuta 
10 cells, conditions, locations, etc. 

* the reagent is a sing.e.hain ammody, po,ynuc,eo«des encoding tbe antibody oan be console 
. -d a C.U eitber ex vivo or in vivo using „e„-«s,ab,ished ,eebni qU es inCuding, be 

not hnuted to, ttansferrin-po.ycadon-mediatod DNA transfer, Wnsfection with ra w o 
15 dZ" ^ r lei0 ,IP — ' iated — « *** ^cetluUr TansporiaHon o 

DEAE- or calcium phosphate-mediated transfectiom 

■ Effective in vivo dosages of an andbody are in the tange of about 5 pg to about 50 pgttg, about 5, 

21 TIT? about 100 " to 500 of ^ 

about 250 pg*g of pafien, body weight. For administration of polynucleotides encoding sin gl e 
I "so ta ^ ^ ~ * fc "» * "~ 100 - * *- ™ ng 500 

If the expression product ia mRNA, the reagent is prefaably an aunaense oHgonudeotide or , 
nb«yme PCynuCeotidea which express andsense oligonucteoddea or ribozymea can b, 

■ mtroducedtnto cells by a variety of methods, as described above. 

Pmfeably, a teagen, reduces expression of a „BRBAST CANCER GENE" gene or the activity „ 
a ™ CANCER GENE" po,ypep«de by a, .east about 10, pmfemWy about 50, to 
pmferably about 75. 90, or ,00% rehfive to the absence of me reagenr The effectiveness of th, 
mechamsm chosen to decrease the level of expression of a ..BREAST CANCER GENE" geoe o 
*e activity of a .3REAST CANCER GENE" po,ypepfide can be assessed using methoda we, 
known m the art, such as hybridization of nucleotide probes to J3REAST CANCER GENE" 
specflc mRNA quantitative RT-PCR, immunologic detecdon of a .3REAST CANCER GENE' 
polypephde, or measurement of .BREAST CANCER GENE" activity. 
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In any of the embodiments described above, any of the pharmaceutical compositions of the 
invention can be administered in combination with other appropriate therapeutic agents. Selection 
of the appropriate agents for use. in. combination therapy can be made by one of ordinary skill in 
the art, according to conventional pharmaceutical principles. The combination of therapeutic 
agents can act synergistically to effect the treatment or prevention of the various disorders 
described above. Using this approach, one may be able to achieve therapeutic efficacy with lower 
dosages of each agent, thus reducing the potential for adverse side effects. 

Any of the therapeutic methods described above can be applied toany subject in need of such 
therapy, including, for example, birds and mammals such as dogs, cats, cows, pigs, sheep, goats, 
horses, rabbits, monkeys, and most preferably, humans. 

All patents and patent applications cited in this disclosure are expressly incorporated herein by 
reference. The above disclosure generally describes the present invention. A more complete 
understanding can be obtained by reference to the following specific examples which are provided 
for purposes of illustration only and are not intended to limit the scope of the invention. 

Pharmaceutical Comvositionx 

The invention also provides pharmaceutical compositions which can be administered to a patient 
to achieve a therapeutic effect. Pharmaceutical compositions of the invention can comprise, for 
example, a JBREAST CANCER GENE" polypeptide, ..BREAST CANCER GENE" polynucleo- 
tide, ribozymes or antisense oligonucleotides, antibodies which specifically bind to a JBREAST 
CANCER GENE" polypeptide, or mimetics. agonists, antagonists, or inhibitors of a .3REAST 
CANCER GENE" polypeptide activity.' The compositions can be administered alone or in 
combination with at least one other agent, such as stabilizing compound, which can be 
administered in any sterile, biocompatible pharmaceutical carrier, including, but not limited to, 
saline, buffered saline, dextrose, and water. The. compositions can be administered to a patieni 
alone, or in combination with other agents, drugs or hormones. 

In addition to the active ingredients, these pharmaceutical compositions can contain suitable 
pharmaceutically acceptable carriers comprising excipients and auxiliaries which facilitate 
processing of the active compounds into preparations which can be used pharmaceutically. 
Pharmaceutical, compositions of the invention can be administered by any number of routes 
including, but not limited to, oral, intravenous, intramuscular, intraarterial, intramedullary, 
intrathecal, intraventricular, transdermal, subcutaneous, intraperitoneal, intranasal, parenteral,' 
topical, sublingual, or rectal means. Pharmaceutical compositions for oral administration can be 
formulated using pharmaceutically acceptable carriers well known in the art in dosages suitable for 
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oral administration. Such carriers enable the pharmaceutical compositions to be formulated a 
tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions, and the like, & 
ingestion by the patient. 

Pharmaceutical preparations for oral use can be obtained through combination of activ 
compounds with solid excipient, optionally grinding a resulting mixture, and processing tb 
mixture of granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee core< 
suitable excipients are carbohydrate or protein fillers, such as sugars, including lactose, sucrose 
manmtol, or sorbitol; starch from corn, wheat, rice, potato, or other plants; cellulose such a 
methyl cellulose, hydroxypropylmethylcellulose, or sodium carboxymetbylcellulose; gums in 
eluding arabic and tragacanth; and proteins such as gelatin and collagen. If desired, disintergratin, 
or solubilizing agents can be added, such as the cross-linked polyvinyl pyrrolidone, agar, algmi, 
acid, or a salt thereof, such as sodium alginate. 

Dragee cores can be used in conjunction with suitable coatings, such as concentrated suga 
solutions, which also can contain gum arabic, talc, polyvinylpyrrolidone, carbopol gel poly 
ethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic solvents or solven 
mixtures. Dyestuffr or pigments can be added to the tablets or dragee coatings for produc 
identification or to characterize the quantity of active compound, i.e., dosage. 

Pharmaceutical preparations which can be used orally include push-fit capsules made of gelatin * 
well as soft, sealed capsules made of gelatin and a coating, such as glycerol or sorbitol Push-fi 
capsules can contain active ingredients mixed with a filler or binders, such as lactose or starches 
lubneants, such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules th, 
active compounds can be dissolved or suspended in suitable liquids, such as fatty oils, liquid, o 
liquid polyethylene glycol with orwithout stabilizers. 

Pharmaceutical formulations suitable for parenteral administration can be formulated in aqueou 
solutions,, preferably in physiologically compatible buffers such as Hanks' solution Ringer' 
solution, or physiologically buffered saline. Aqueous injection suspensions can contain substance 
which increase the viscosity of the suspension, such as sodium carboxymethyl cellulose, sorbitol 
or dextran. Additionally, suspensions of the active compounds can be prepared as appropriate oib 
injection suspensions. Suitable lipophilic solvents or vehicles include fatty oils such as sesame oil 
or synthetic fatty acid esters, such as ethyl oleate or triglycerides, or liposomes. Non-lipid poly 
canonic amino polymers also can be used for delivery. Optionally, the suspension also can contai, 
suitable stabilizers or agents which increase the solubility of the compounds to allow for th. 
preparation of highly concentrated solutions. For topical or nasal administration, penetrant 
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appropriate to the particular barrier to be permeated are used in the formulation Such _ 
are generally known in the art. lobulation. Such penetrants 

composition can be provided *q * coi* n a ^ ^ F"*nnaceuncai 
** ta •*""» OT — ~ are dre co^ ng ZUZZ T 

is combined with buffer prior to use. ' 

Further details on techniques for formulation and administration can be found in the 1^ . 

of Remington's Pharmaceutical Sciences a i n After h ° n 

» — r , io , such Ubelta8 W0UM _ e amotmt ^ j*--- 

One strategy f or identifying genes that are involved in breast cancer i s to d«. 

~ — * — — — - « ta diZ::: :r ~; r 

context of therapy response conditions. The sob-sections below describe a oorT 7 
- systems which can be used to detect 5 uch different J^Z^J^Z 
expenmenta, systems ^ „ _ „„, ^ * 

— . a manner ^sooi^ed wUh breast cancer, in addition ,„ „ leas , one L^T'" 
condition lacking auch disease associated treatment or Omental control 

<~,y expressed genea are detected. ^ l^fT 

B expreasion between ft. experimental and controi conditions. ^ ° f ^ 

Once , parucdar gene has been identified through the use of on. such experiment its exores,- 
pattern may be fcrdrer charactered b, atudying it5 expression in a «JLZZ Z 72 
fmdmgs may be validated by an iddependent technique. Such use of muldoie 
-~ India, ^ the rolea and rdative impolnce JZZZ ^ZZZZ 
0 ft. treatment thereof. A combined approach, comparing gene expression paKem in JZZ 
*™ breast cancer padenta to those of, W,o cei, cuUure mode,a can give substantial ZZZ 

ole o such genes ,„ the devest of resistance or insensluviry to certain therapeutic "en* 
(e.g. chemotherapeutic drugs). S 
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Among the experiments which may be utilized for the identified., . of differentia.,,, express 
genes invo.ved in mahgnan, neoplasia and breas, cancer in paticular, are experiment designed 
analyze those genes which are involved in signal transduction. Such experiment may serve 
identify genes involved in the proliferation of cells. 

Below are methods described for the identification of genes which are involved in breast cane. 
Such represent genes which are differentially expressed in breast cancer conditions-relative to the 
expression in normal, or non-breast cancer conditions or upon experimental manipulation based < 
clinical observations. Such differentially expressed genes represent "target" and/or "marker" gene 
Methods for the further characterization of such differentially expressed genes, and for the 
• identification as target and/or marker genes, are presented below. 

Alternatively, a differentially expressed gene may have its expression modulated ie 
quantitatively increased or decreased, in normal versus breast cancer states, or under control versi 
experimental conditions. The degree to which expression differs in normal versus breast cancer c 
control versus experimental states need only be large enough to be visualized via standar 
characterization techniques, such as, for example, the differential display technique describe 
below. Other such standard characterization techniques by which expression differences may b 
visualized include but are not limited to quantitative RT-PCR and Northern analyses which ar 
well known to those of skill in the art. 

In Addition to the experiments described above the following describes algorithms and statistics 
analyses which can be utilized for data evaluation and for .he classification as well as respons 
prediction for a sofar no. classified biological sample in the context of control samples. Predictiv 
algonthms and equations describe below have already shown their power to subdivide individua 



cancers. 
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EXAMPLE 1 

t 

. ^ressign profiline utilizing q uantitative ki^ti, pt 

"Za^sT S 2 en ° e i DettCHOn *— ° f PE Applied Biosystems (Per*, EImer , 

."* ^ ' PO«er dye and a „ue„eher dye, one can perform such a 

express™ meaeur^nen,. AmpUficafion of to probe . 5pecific J » 

TO ™se in reporter fl „c. Pfimers and probes were se.ec.ed uL, 
Expreas aoftwar. and ,„c a ,iaed moafiy h fte 3. region of fc ^ « ^ 

»«rana,a«d regron (see Table 5 ftrr primer- and probe- se„ All prune, pan. wer e ch^ed 
b apecrficy by cnvenH^al PCR reacuonsand 8 el elecbophoreais. To aJLdize the « 
of sarnple RNA, GAPDH wa, S e,ec,ed as a reference, stace ft „aa no, differen^ZT 
he sample, analyzed. To performe auch an expression analyala of genea ^ . ££Z 
aamplea toe mapecnve primer/probes are prepared by mixing 25 pi of the 100 ,M , l . 
^er Prune,, 25 p, of me ,00 pM sto c k aolnlon ~uj J^Z X 
«o* soluuon Ta^an-probe (PAM^amra) and ad ju a te d to 500 p, wfth ao.ua dear ( Pri^^T 
mrx). Por each reachon ,,25 p, cDNA of me pane* samplea were mixed wid, 8, 5 p, nucW 

Ko e roi; t * r wen of a 96 w °"- op,ioai ReMto pi - te ^ 

2 A , d « "~ Scribed *ove. ,2,5p, Ta, Man Univeraa.-PCR- 

- <2x) (Apphed B.oays.en. P„ No. 43,8,57) and , p, Wate r are men added. The 96 we^l 
pU.es are Cosed wid, 8 Caps/S*ips (Appued Biosyslema P„ NlI mber 4323 „3 2) rad 

for 3 mmu*. Meaauremems of toe PCR reacuon are done according „ M uLcb^ofl 
nrannfacn^ wfth . TaqMan 7900 „ ^ ^ - of £ 

condmons (2 nun. 5<,C,,0 nun. 95-C, 0,5nun. ,5-C, , nun. o0»C ; 40 c Ces, Pn" 
—en. of so fcr unCasaifi* oio.og.ca. samp.ea conn., exprehueum wi„ ,. J n i ^ 
heahhy con.ro, samplea, aamp,ea of defined fterapy reaponse c„„,d be uaed for aUdLJHf 
the experimental conditions. nonot 

TaoMan vafidadon experiment were performed showing du,. me efficienciea of me hrrge. and the 
conn, amphficafiona are approxima K ,y e qU a, w Wch ia a prere^e for ihf^ 
quanfificanon ofgene expresaion by me comparanve AAC T melhod, Wrown .0 mose with sl d«a in 
me nr.. Herefor me SoflwareSOS 2.0 from Apphed Bioayslema can be uaed aecordTngt m 
™ mamaefions. CT-va,ues are men finther ana,yaed wim approp„a,e sofiware (Micro s !ft 
ExceP") of atatisUcal software packagea (SAS). lM>croaott 
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As well as the technology described above, provided by Perldn Elmer, one may use oth« 
technique implementations like Lightcycler ™ from Roche Inc. or iCycler from Stratagem 
Inc.capable of real time detection of an RT-PCR reaction. 

EXAMPLE 2 

Expression profiling utilizing DNA microarrays 

Expression profiling can bee carried out using the Affymetrix Array Technology. By hybridizatio, 
of mRNA to such a DNA-array or DNA-Chip, it is possible to identify the expression value o 
each transcripts due to signal intensity at certain position of the array. Usually these DNA-array' 
are produced by spotting of cDNA, oligonucleotides or subcloped DNA fragments m case o 
Affymetrix technology app. 400.000 individual oligonucleotide sequences were synthesized on th< 
surface of a silicon wafer at distinct positions. The minimal length of oligomers is 12 nucleotides 
preferable 25 nucleotides or full length of the questioned transcript. Expression profilingmay alsc 
be carried out by hybridization to nylon or nitrocellulose membrane bound DNA o, 
oligonucleotides. Detection of signals derived from hybridization may be obtained by eithe, 
colonmetric, fluorescent, electrochemical, electronic, optic or by radioactive readout Detailec 
description of array construction have been mentioned above and in other patents cited T< 
determme the quantitative and qualitative changes in the chromosomal region to analyse RN/ 
from tumor tissue which is suspected to contain such genomic alterations has to be compared tc 
RNA extracted from benign tissue (e.g. epithelial breast tissue, or micro dissected ductal tissue) or 
the basis of expression profiles for the whole transcriptome. With minor modifications, the sampl, 
preparation protocol followed the Affymetrix GeneChip Expression Analysis Manual (Santa Clara 
CA). Total RNA extraction and isolation from tumor or benign tissues, biopsies, cell isolates o 
cell containing body fluids can be performed, by using TRIzol (Life Technologies, Rockville MD 
and Oligotex mRNA Midi kit (Qiagen, Hilden, Germany), and an ethanol precipitation step shoulc 
be earned out to bring the concentration to 1 mg/ml. Using 5-10 mg of mRNA to create doubl, 
stranded cDNA by the Superscript system (Life Technologies). First strand cDNA synthesis w* 
primed with a T7-(dT24) oligonucleotide. The cDNA can be extracted with phenol/chloroform an< 
precipitated with ethanol to a final concentration of Img /ml. From the generated. cDNA cRN/ 
can be synthesized using Enzo's (Enzo Diagnostics Inc., Farmingd^ NY) in vitro Transcript^ 
K,t. W.thm the same step the cRNA can be labeled with biotin nucleotides Bio-ll-CTP and Bio 
16-UTP (Enzo Diagnostics Inc., Farmingdale, NY) . After labeling and cleanup (Qi age „ Hilda 
(Germany) the cRNA then should be fragmented in an appropriated fragmentation buffer (e g 4, 
mM Tris-Acetate, pH 8.1, 100 mM KOAc, 30 mM MgOAc, for 35 minutes at 94 °C). As per tb 
Affymetrix protocol, fragmented cRNA should be hybridized on the HG_U133 arrays A and B 
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eompmrng app. 40.000 probed Scripts each, for 24 hours a, 60 rpm m a 45 - C hvhriH- « 
oven. After Hybridizanon s,ep the chip surface* have to be washed Z2 , C /"" l ' > ' 2M ' m 
phycoery,hri„ (SAFE; MolecuUr Probes Eugene OKUnlZ^ Z 
saining, a second .abeling s,ep can b. h^Tn Sta ' i0,K - T ° 

5 Here one shouid add SAP^ * ~nded bu t no, compuisive. 

rv a o y ° ted by fluoro ^tric scanning (Hewlett Packard 

Gene Array Scanner; Hewlett Packard Corporation, Palo Alto, CA). 

After hybridization and scanning, the microarray images can be analyzed for ' v 

looKng for ma j0 r chip defects or abnormalities in hybridization iT^Z^ZT 

dataanalysasshouldbec^ u^d. Pnmary 

In case of the genes analyses in one embodiment of this invention th. • , 

analyzed by fhrther bioinformatic tools and additional ZZZT£F7 ^ ^ 

described in detail below. ^ bl01nfo " na ^ analysis is 

15 EXAMPLE 3 

Data analysis from expression profiling experiments 

According to Affymetrix measurement technique (Affymetrix Ge^r-v * 

Manua,, San, Cars. CA > a sing, gene expression ^^^^l ^ 

difference va,„e and n» abso,u te ca«. Each chip comains ,6-20 ohg „ atXl 

r hT; DNA c,one - ^ probe - , — ^ — - -^cr£ 

of which are necessary for the calculation of the average difffe™ i 

or - in.cn,, France for each ^ V^^T^ZZ^TT 
m,sma«ch from to ^ „ f ^ ^ ^ ^ ^ ■ * »^ f *e 

hybnmzauon among probe pairs and o.hcr hybridization amfccts tat cou!d affec, IT 
5- hnensiues. The average difference is a nu m eric va,ue supposed ,o rZlTZT ™ 
of d-a, gene. Tbe absoiu* ca„ can «. ft e va,ucs 'A' ( Ln„. ^ZZZZT t 
and deuces te <,ua lit y of a s h g,e hybrid^. We used bod, the Jul n^, 

Senes whtth are d iff eren«a„y expressed in bioiogica, samp.es ^ individua|s ^ 
. versus b-Cogica, samp.es from te norma, popu,.„o„. Wi,h odrer a.goridnna man 

one we have obuined differen, nu^erica, va,ues represent «he same expression vah^ Z 
express™ differences upon comparison. *"°n values and 
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The different expression E in one of «h= breast cancer groups conrpared to ^ normal populaM( 

ts calculated as follows. Given u average difference values d„ * d. h ^ breast oanc 

population and m avenge difference values c„ „ c. in the population of no™! individual 

is computed by the equation: 



S H^H, ^t) —1,1 (eouatio, ,) 



a 
wi 



If ^50 or c s <50 for one or more values of i and j, these particular values c, and/or dj are set to , 
"artificial" expression value of 50. These particular computation of E allows for a corre 
comparison to TaqMan results* 

A gene is called up-regulated in breast cancer versus normal if E > minimal change factor given i 
Table 3 and if the number of absolute calls equal to <P> in the breast cancer population is great, 
than n/2. The minimal fold change factors in Table 3 are given for those patient populate 
responding to a given chemotherapy (CR), non responding to a administered chemotherapy (NC 
• or those tissues without any pathological signs of a tumor <NB). Fold changes greater than 1 re f e] 
to an mcrease in gene expression in the first names tissue sample compared to the second Th 
regulatxon factors are mean values and may differ individually, here the combined profiles of 
185 genes listed in Table la and lb in a cluster analysis or a principle component analysis 
indicate the classification group for such sample. 

According to the above, a gene is called down-regulated in breast cancer versus normal if E 
minimal change factor given in Table 3 and if the number of absolute calls equal to 'P* i„ th 
breast cancer population is greater than n/2. Values smaller than 1 describe an decrease 
expression of the given gene. 

The minimal fold change factors given in Table 3 indicate also the relative up- and dowr 
regulation of those gene indicative of tumor presence. These genes do show in the comparison c 
any tumor tissue to the normal healthy counterpart (NT) the highest increase or decrease factoi 
(e.g.SEQID:43,55,65,orl62) 

The final list of differentially regulated genes consists of all up-regulated and all down-regulate 
genes in biological samples from individuals with breast cancer versus biological samples from th 
normal population or of an individual response pattern. Those genes on this list which ar 
interesting for a diagnostic or pharmaceutical application were finally validated by quantitativ 
real time RT-PCR (see Example 1). If a good correlation between the expression values/behavic 
of a transcript could be observed with both techniques, such a gene is listed in Tables 1 to 5 
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EXAMPLE 4 

Analysis of differential gene expression patterns using support vector machines 

• Support vector machines (SVM) are well suited for two-class or multi-class pattern recognition 
(Weston and Watkins, 1999 (1 15); Vapnik, 1995 (1 16); Vapnik, 1998 (1 17); Burges, 1998 (118). 

5 For the two-class classification problem, (e.g. tumor tissue vs. non tumor, tissue, or therapy 
response vs. non response) assume that we have a set of samples, i.e., a series of input vectors 

xTeR' (i = 1, 2, m) 

with corresponding labels 

y, e{+ 1,-1} (i = 1,2,.. .,m). 

10 Here, +1 and -1 indicate the two classes. To classify gene expression patterns of marker genes 
from Table la and lb or 2 for describing the current tumor status or probable response to a 
therapeutic agent, the input vector dimension is equal to the number of different oligonucleotide 
types present on the oligonucleotide array or a subset hereof, and each input vector unit stands for 
the hybridization value of one specific oligonucleotide type. 

15 The goal is to construct a binary classifier or derive a decision function from the available samples 
which has a small probability of misclassifying a future sample. 

An SVM implements the following idea: it maps the input vectors 

into a high-dimensional feature space 
20 <D(x) <±H 

and constructs an Optimal Separating Hyperplane (OSH), which maximizes the margin, the 
distance between the hyperplane and the nearest data points of each class in the space H. By 
choosing OSH from among the many that can separate the positive from the negative examples in 
the feature space, S VMs are avoiding the risk of overfitting. 



25 Different mappings construct different SVMs. The mapping 
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0 : R rf l-» H 

is performed by a kernel function 
K(x„Xj) 

which defines an inner product in the space H. 

The decision function implemented by SVM can be written as. (Burges, 1998 (118): 
fix) =sgn^£; y { a i ■ Kfe, x~)+ bj (equation 2) 

where the coefficients <, are obtained by solving the following convex Quadratic Programmin 
(QP) problem: 



Maximize 
subject to 0&a,£C 



and '=i 



(equation 3) 



The regularity parameter C (equation 3) controls the trade off between margin anc 
misclassification error. The xj are called Support Vectors only if the corresponding a,- > 0. 

Two of the kernel functions used in the current example: 

' (equations) 

where the first one (equation 4) is called the polynomial kernel function of degree d which wil 
eventually revert to the linear function when d = 1, the latter (equation 5) is called the Radial Basic 
Function (RBF) kernel. ' 
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For a given data set, only the kernel function and the regularity parameter C must be selected to 
specify one SVM. An SVM has many attractive features. For instance, the solution of the QP 
problem „ globally optimised while with neural networks the gradient based training algorithms 
only guarantee finding a local minima. In addition, SVM can handle large feature spaces can 
5 effectively avoid overfitting (see above) by controlling the margin, can automatically identify a 
small, subset made up of informative points, i.e., the Support Vectors, etc. 

The classification of biological sample and thereby the identification of an neoplastic lesion as 
well as the response of such lesion to therapeutic agents based on gene expression data is a multi- 
class classification problem. The class number k is equal to the number tumor subcalsses (e g 

10 histological features, TNM stage, grade, hormonal status) and is equal to response subgroupe to a 
certain therapeutic agent (e.g. pathologicaly confirmed complete remission, good remission, partial 
remission, or no remission, as well as progressive disease) which shall be predicted, i.e which are 
present in the training data set Due to the limited number of different classes in the present sample 
set, we decided to handle the multi-class classification by reducing the multi-classification to a 

15 series of binary classifications. For a *-class classification, k SVMs are constructed. The fth SVM 
wtll be trained with all of the samples in the fth class with positive labels and all other samples 
with negahve labels. Finally an unknown sample is classified into the class that corresponds to the 
SVM with the highest output value. This method is used to construct a prediction/classification 
system for gene expression patterns of differentially expressed marker genes as given in Table la 
20 . and lb and 2. 

Each data point generated by a microarray hybridization experiment or by real time RT-PCR (cf. 
example I and 2) corresponds to and is determined by the number of rnP^A copies present in the 
analysed sample, i.e., from an experiment with n oligonucleotide types on a polynucleotide array a 
series of n expression-level values is obtained. These „ values are typically stored in a metrics file 

25 which is the result of the analysis of a "eel file" by the Affymetrix® Microarray Suite or software 
described above. The data from a series of m metrics files (representing m expression analyses) are 
taken to build an expression matrix, in which each of the m.rows consists of an «-element 
expression vector for a single experiment. In order to normalise the expression values of the m 
experiments, we define x u to be the sum of the logarithms of the expression level a u for gene," 

30 (whose mRNA hybridizes with the oligonucleotide type./' present on the microarray, or gives a 
valid AACt intesity), normalized so that the expression vector ^ has the Euclidean length 1: 



.. XjJ /„ 

V *=i 



(equation 6) 
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Initial analyses are carried out using a set of 20000-element expression vectors for V 
experiments as described in example 1 and 2 (100 experiments in the training set and 50 in the te 
set). 

Using the knowledge that the 150 experiments represent three different response- classes and t* 
different tumor states as well as the information of tumor and non-tumor tissue, we trained tl 
SVMs described above with the training set to recognize those response classes and disease state 
The test set was used to assess the prediction accuracy. Here we have preformed crossvalidatior 
utihzing the "leave one out" method and for more stringent testing a four to five fold validatic 
(leave 25% out) with n iterations ( n>100). 

In such crossvalidations and classification experiments the predictive power of a subset of mark, 
genes chosen from Table la and lb (e.g. SEQ ID: 27, 38, 55, 81, 97, 98) has been tested Tt 
average cross validation error rate was 8.333 % with affinity levels as follows: 

Tissue sample True response Predicted CR Predicted NC 

XZIK So 09141 - 091 41 

SmnH So 1281 -1-281 

Sampe_3 CR 1.149 _-, 149 - 

Sample_5 CR 0 .2182 -0.2182 

Sample_7 NC . 1/)2 4 1 124 

Sample_8 NC . .1.493 • 14Q2 

Sampe_9 NC .^ QQ6 1896 

Sample_10 NC 0 .475 -0 475 

Sample_12 NC _ 0 . 7 557 0.7557 

The misclassification of one sample can be compensated by addition of more marker genes fror 
Table la and lb. These data show the minimal number of marker genes that could be combined fc 
a predictive assay or kit. 

EXAMPLE 5 

In order to optimize prediction of non responding tumor samples one may. use this class from th 
trainings cohort and run multiple statistical tests, suitable for group comparison such as t-test 0 
Wilcoxon. As listed in Table 6 one can identify such genes with a differential expression in th 
non responding tumor tissue and a significance level (p-value) below 0.05. In Table 6 20 genes ar 
selected fulfilling the criterion of low p-value and high expression* fold change between the tw 
classes. 
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One may combine the gene list selected as most preffered given in Table 2 with those genes from 
Table lb and performe classification expriments for any sofar unclassified sample and predict 
response to chemotherapy. 

While as those algorithms described in Example 4 can be implemented in a certain kernel to 
5 classify samples according to their specific gene expression into two classes another approach can 
be taken to predict class membership by implementation of a k-NN classification. The method of 
k-Nearest Neighbors (k-NN), proposed by T. M. Cover and P. E. Hart, an important approach to 
nonparametric classification, is quite easy and efficient. Partly because of its perfect mathematical 
theory, NN method develops into several variations. As we know, if we have infinitely many 
10 sample points, then the density estimates converge to the actual density function. The classifier 
becomes the Bayesian classifier if the large-scale sample is provided. But in practice, given a small 
sample, the Bayesian classifier usually fails in the estimation of the Bayes error especially in a 
high-dimensional space, which is called the disaster of dimension. Therefore, the method of Ar-NN 
has a great pity that the sample space must be large enough. 

15 In k-nearest-neighbor classification, the training data set' is used to classify each member of a 
"target" data set The structure of the data is that'there is a classification (categorical) variable of 
interest (e.g. "responder" (CR) or "non-responder" (NC)), and a number of additional predictor 
variables (gene expression values). Generally speaking, the algorithm is as follows: 

• For each sample in the data set to be classified, locate the k nearest neighbors of the 
training data set. A Euclidean Distance measure can be used to calculate how close each 
member of the training set is to the target sample that is being examined. 

2. Examine the k nearest neighbors - which classification do most of them belong to? Assign 
this category to the sample being examined. 

3. Repeat this procedure for the remaining samples in the target set. 

25 Of course the computing time goes up as k goes up, but the advantage is that higher values of k 
provide smoothing thatreduces vulnerability to noise in the training data. In practical applications, 
typically, k is in units or tens rather than in hundreds or thousands. 

The "nearest neighbors" are determined if given the considered the vector and the distance 
measurement. Given a training set of expression values for a certain number of samples j 

30 T= {(xl, yl), (x2, yl), • • , (xm, ym)}, to determine the class of the input vector x. 



20 
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The most special case is the k-NN method, while k= 1, which just searches the one near* 
neighbor: 

j = argmin //x - xi// 
then, (x, yj) is the solution. 

For estimation on the error rate of this classification the following considerations could be made: 

A training set T= f(xl, ,1), (x2. y2), --.(xm, ym)} is called (k, ^-stable if the error rate of/ 
NN method is d%, where rf% is the empirical error rate from independent experiments. If th 
clustering of data are quite distinct (the class distance is the crucial standard of classification), the 
the k must be small. The key idea is we prefer the least k in the case that d% is bigger the threshol 
value. 

The *-NN method gathers the nearest k neighbors and let them vote - the class of most neighbor 
wins. Theoretically, the more neighbors we consider, the smaller error rate it takes place, Th 
general case is a little more complex. But by imagination, it is true to be the more 

k the lower upper bound asymptotic to PBayes(e) if iVis fixed. 

One can use such algorithm to classify and cross validate a given cohort of samples based on th 
genes presented by this invention in Tables la and lb. Most preferably the classification shall b. 
performed based on the expression levels of the genes presented in Table lb in combination wit] 
the genes from Table 2. With k = 3 and > 100 iteration one can get classifications as deplete, 
below for a cross-validation experiment with the three classes ''normal breast tissue" (not affecte, 
by cancer), non responding tumor (NC), and responding tumor (CR). Affinities ranging from -1 t< 
1 for a given class. 

Tissue sample True^ Predicted normal Predicted-NC Predicted-CR Remarks 
"normal" tissue " ^ . 1 Q5 * 

lamSei cr ' *°' 4988 '°- 5 °" 88 

t>ampie_3 CR -0.4988 -0 5 o 99Rft 

Sample_4- CR -0 5 jqs ! 

Sample_5 CR _ 0 .4988 -0*5 0 99fia 

Sample_6 CR _ 0 . 5 £5 °" 88 

Samp e_8 CR ^. 4 a 83 _ 0 . 4649 0 953,5 

Sampe_9 NC .0.497 0 .997 05 

Sampe_10 NC .q.4969 0.9969 £5 

Sampe_11 NC . 0 .4975 0.9975 -05 

Sampe_12 NC . 0 .4982 0.9982 -05 

Sample_13 NC 1 nc 

1 -0-5 • -0.5 low tumor % 
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Sample_14 NC 
Sample_15 NC 
Sample_16 NC 



-88- 

breast ted Predicted " NC Predicted-CR Remarks 

-0.5 -0.4988 0.9988 false 

-0.4976 0.9976 -0 5 

-0.4976 0.9976 - 0 .5 



The misctasif.ca.ion of one sampie can be compensated by addition of mo re marker genes from 
Table la. These data show the minimal number of marker genes to, couid be combined for a 
predictive assay or kit. 

EXAMPLE 6 

m order to get the most accurate prediction for response to chemotherapy based on the expression 
levels of genes listed in Tables la and Table lb. One can implement a step wise classification 
model identifying first those individuals (tumor tissues) with the highes affinity (e.g. by k-NN 
creation) to the class of responding tumors (CR). If an sofar unclassified tumor sample did 
not belong to the class of CR on may perfprme a second classification step for this sample unsing 
the expression levels of the genes from Table la (e.g. SEQ ID Nos: 2, 8,9,21 24 35 53 54 57 
64, 80, 87, 89, 95, 97, 1 18 and 146 ) which will give in a k-NN classification a better separation of 
the non responding tumors from those .which will respond partially. For this second classification 
step only the predefined classes NC and PR should be utilized. 
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Tab ' e la : LiSt ° f 165 8 enes which are differentially expressed in responders compared 1 
non-responders or normal healthy tissue. Reference is given to the SEQ ID NOs of th 
sequence listing. 



SEQ ID NO: SEQ ID NO: Gene Symbol 
(DNA (Protein 
Sequence) Sequence) 



1 


166 CTSB 


2 


167 SSR1 


3 


168 STXS 


4 


169 KPNA2 


5 


170 CSE1L 


6 


171 


7 


172 DKC1 


8 


173 IGFBP4 • 


9 


1 74 SMC1 L1 


10 


175 PWP1 * 


11 


176 HDAC2 


12 


177 PRKAB1 


-13 


178 IMPDH2 


14 


179 UBE2A 


15 


180 YR-29 


16 


181 MUF1 


17 


182 MYO10 


18 


183 EGFR 


19 


184 IFRD1 


20 


185 CD2BP2 


21 


186 ARL3 


22 


187 CCNB2 


23 


188 FMOD 


24 


189 SLC7A8 


25 


190 E2-EPF 


26 


191 AGT 


27 


192 FHL2 


28 


193 LDLC 


29 


194 MGC16824 


30 


195 UGDH 


31 


196MAD2L1 


32 


197 DDB2 


33 


198 OS4 


34 


199BCL2 


35 


200 SEMA3C 


36 


201 DTR 


37 


202 GARP 


38 


203 ACK1 


39. 


204 EDG2 


40 


205 RARRES3 


41 


206 CCNH 


42 


207 PREP 


43 


208COL11A1 


44 


209 GALC 



Ref. 

Sequences 
LA] 

NM_001908 

NM_003144 

NM_002803 

NM_002266 

NM_001316 

NM_005614 

NM_001363 

NM_001552 

NM_006306 

NM_007062 

NM_001527 

NM_006253 

NM_000884 

NMJ003336 

NM_014886 

NM_006369 

NM_012334 

NM_005228 

NM_001550 

NM_006110 

NM_004311 

NM_004701 

NM_002023 

NM_012244 

NM_014501 

NM_000029 

NM_001450 

NM_007357 

NM_020314 

NM_003359 

NM_002358 

NM_000107 

NM_005730 

NM_000633 

NM_006379 

NM_001945 

NM_005512 

NM_005781 

NM_001401 

NM_004585 

NM_001239 

NM_002726 

NM_001854 

NM_000153 



Gene ID 



4503138 
14781630 
4506208 
4504896 
18591914 
18600748 
15011921 
10835020 

5902033 
4557640 
18602783 
. 4504688 
4507768 
7662676 
5453747 
11037056 
4885198 
4504606 
5174408 
4757773 
10938017 
18548671 
14751202 
7657045 
4557286 
4503722 
6678675 
10092674 
4507812 
6466452 
4557514 
5031964 
13646672 
5454047 
4503412 
5031706 
8922074 
16950637 
8051633 
17738313 
4506042 
18548530 
4557612 



Locus_Link I 
D 

1508 
6745 
5701 
3838 
1434 
6009 
1736 
3487 
8243 
11137 
• 3066 
5564 
3615 
7319 
10412 
10489 
4651 
1956 
3475 
10421 
403 
9133 
2331 
23428 
27338 
183 
2274 
22796 
57020 
7358 
4085 
1643 
10106 
596 
10512 
1839 
2615 
10188 
1902 
5920 
902 
5550 
1301 
2581 
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SEQ ID NO: SEQ ID NO: Gene Symbol 
(DNA (Protein 
Sequence) Sequence) 



45 


210 HMGCS2 


46 


211 ZNF274 


47 


212TFF1 


48 


213 RAD51 


49 


214 ASNS 


50 


215 PCMT1 


51 


216 ESR1 


52 


217 ACAT1 


53 


218 XPA 


54 


219 LAF4 


55 


220 nni iriAi 

+-£~\J OUL I \Jr\ 1 


56 


221 KIAA1 Ozll 


57 


222 PI A9f57 


58 


2?^ fiRP 
*-^-0 V3r\r 


59 


294 P.YP9Rfi 


60 


£-£.\j v^nnU 


61 


226 C5AI MT1H 


62 


227 ttAnnzi^n 


63 


228 WR^PR9n 


64 


229 RTRno 


65 




66 




67 


232 fMR 


68 


233 crwm 


69 


234 PDHR 


70 


235 HNRPni 


71 


236 TAF1 1 


72 


237 AMAPR 


73 


238 EMn 


74 


239 NR2F1 


75 


240 HSF? 


76 


241 SPG4 


77 


242 TRIP1 1 


78 


243 OCLN 


79 


244 CACMA1 n 


80 


245 CYP2R7 


81 


246 FHL1 


82 


247 MSX2 


• 83 


248 PAI-RBP1 


84 


249 CLDN14 


85 


250 ITPK1 


86 


251 ERBB2 


87 


252 TP53 


88 


253 HSPA2 


89 


254 LIG1 


90 


255 GSS 


91 


256 PR01843 


- 92 


257 MKI67 


93 


258 BIK 


94 


259 KIAA0225 


95 


260 TNRC15 


96 


261 SFRS5 


97 


262 RPL17 


98 


263 GNG12 



Ref. 

Sequences 
[A] 

NM_005518 
NM_016324 
NM_003225 
NM_002875 
NM_001673 
NM_005389 
NM_000125 
NM_000019 
NM_000380 
NM_002285 
NM_000493 
. NM_014947 
NM_005084 
NM_002091 
NM_000767 
NM_001267 
NM_017540 
NM_015675 
NM_017528 
NM_017797 
NM_000926 
NM_004865 
NM_000592 
NM_004060 
NM_000925 
NM_005463 
NM_005643 
NM_014324 
NM_000117 
NM_005654 
NM_004506 • 
NM_014946 
NM_004239 
NM_002538 
NM_000720 
NR_001278 
NM_001449 
NM_002449 
NM_015640 
NM_012130 
NM_014216 
NM_004448 
NM_000546 
NM_021979 
NM_015541 
NM_000178 
NM_018507 
. NM_002417 
NM_001197 
D86978 
AB014542 
NM_006925 
NM_000985 
NM_018841 - 



Gene ID 



5031750 
7706506 
4507450 
4506388 
4502258 
4885538 
4503602 
4557236 
4507936 
4504938 
18105031 
15299048 
4826883 
4504158 
14550410 
4502798 
9055207 
9945331 
8923713 
8923361 
4505766 
4759233 
14577918 

4505686 
14110410 
5032150 
14725899 
4557552 
5032172 
6806888 

10863904 
9257230 

14550410 
4503720 
18560141 
7661625 
18593128 
18583687 
4758297 
8400737 
13676856 
18554950 
4504168 
8924082 
4505188 
7262371 
18566873 
18550089 
5902077 
14591906 



Locus_Link I 
D 

3158 
10782 
7031 
5888 
440 
5110 
2099 
38 
7507 
3899 
1300 
22887 
7941 
2922 
1555 
1101 
55568 
4616 
114049 
55643 
5241 
9519 
. 721 
900 
5162 
9987 
6882 
23600 
2010 
7025 
3298 
6683 
9321 
4950 
776 
1556 
2273 
4488 
26135 
23562 
3705 
2064 
7157 
3306 
26018 
2937 
55378 
4288 
638 
23165 
26058 
6430 
6139 
55970 



BHC03 1 0014)1 



95- 



SEQ ID NO: SEQ ID NO: Gene_Symbol 
(DNA (Protein 
Sequence) Sequence) 



99 


264LAP1B 


100 


265 LOC253782 


101 


266 COL5A1 


102 


267 CXCL13 


103 


268 TTS-2.2 


104 


269 KIAA0056 


105 


270 FLJ22642 


106 


271 LOC1 13146 


107 


272 GPR126 


108 


273 PMSCL1 


109 


274 KIAA0418 


110 


275 SULF1 


111 


276 KIAA0673 


112 


277 FLJ10803 


113 


278 DKFZp586M0723 


114 


279 Q4A 


115 


280 2AP3 


116 


281 NEK9 


117 


282 FLJ13125 


118 


283 FM05 


119 


284 COMP 


120 


285 CSPG2 


121 


286 LOC151996 


122 


287 TFAP2B 


123 


288 OR7E38P 


124 


289 RAB31 


125 


290 HSPC126 


126 


291 UMP-CMPK 


127 


292 FU22195 


128 


293 DCTN4 


129 


294FLJ20273 


130 


• 295KIF4A 


131 


296THTP 


132 


297 PLSCR4 


133 


298FLJ11323 


134 


299 MGC1 1242 


135 


300 CEGP1 


136 


301 SRR 


137 


302 HSPC177 


138 


303 MGC3103 


139 


304 FLJ20641 


140 


305 FLJ 13646 


141 . 


306KCNK15 


142 


307 RNASEL 


143 


308 CRSP6 


144 


309 COL5A2 


145 


310 LOC51218 


146 


• 31 1 APBB2 


147 


312yy15c12.s1 


148 


313AD037 


149 


314 FLJ20477 


150 


315 MARKL1 


151 


316 LUM • 


152 


317COL3A1 



Ref. 

Sequences 
[A] 

NM_015602 

AL080192 

NM__000093 

NM_006419 

AF055000 

D29954 

AI700633 

W28438 

NM_020455 

NM_005033 

NM_014631 

NM_015170 

NM_015102 

NM_018224 

AL050227 

NM_007293 

L40403 

NM_033116 

AK023187 

NM_001461 

NM_000095 

NM_004385 

AA418080 

NM_003221 

AF065854 

NM_006868 

NM_014166 

NM_016308 

NM_022758 

NM_016221 

NM_019027 

NM_012310 

NM_024328 

NM_020353 

NM_0 18390 

NM_024320 

NM_020974 

NM_021947 

NM_015961 

NM_024036 

NM_017915 

NM_024584 

NM_022358 

NM_021133 

NM_004268 

NM_000393 

NM_016417 

NM_1 73075 

IM31716 

NM_032023 

AA203365 

NMJ031417 

NM_002345 

NM 000090 



Gene ID 



17488747 

18571690 
5453576 
3231586CB1 
18578675 

15300131 
18562351 
4826921 
7662103- 
18571189 
14720169 



14577920 
18597333 
14916458 
14726621 • 
4503760 
4557482 
4758081 
18554956 - 
4507442 
18544324 
5803130 
14759175 
7706496 
12232426 
14733974 
9506670 
14765683 
13236576 
9966818 
8922994 
13236560 
10190747 
8922495 
7705488 
13128987 
8923595 
13375767 
16507967 
10863928 
18577903 
16554580 
9994192 
18557629 

14042936 
8923441 - 
-13899224 
4505046 

15149480 



Locus_Link I 
D 

26092 
253782 
1289 
. 10563 
57104 
23310 

113146 
57211 
5393 

23213 
261734 
55744 

720 
56252 
91754 

2330 
1311 
1462 

7021 
10821 
11031 
29079 
. 51727 
64771 
51164 
54502 
24137 
79178 
57088 
55344 
79170 
57758 
63826 
51510 
78999 
55010 
79635 
60598 
6041 
9440 
1290 
51218 
323 

83937 

57787 
4060 
1281 
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SEQ ID NO: SEQ ID NO: Gene_Symbol 



(DNA 
Sequence) 



(Protein 
Sequence) 



153 


318 COL1A1 


154 


319 BF 


155 


320 ADAM 12 


156 . 


321 LOXL1 


157 


322 CEACAM6 


158 


323 MMP1 1 


159 


324 MMP1 


160 


325 MMP13 


161 


326 SERPINH1 


162 


• 327PITX1 


163 


328 RAD52 


164 


329 INHBA 


165 


330 CSPG2 



Ref. 

Sequences 
[A] 

NM_000088 
NM_001710 
NM_003474 
NM_005576 
NM_002483 
NM_005940 
NM_002421 
NM_002427 
NM_001235 
NM_002653 
NM_015419 
NM_002192 
NM_004385 



Gene ID 



Locus_Link I 
D 



18587373 
14550403 
13259517 
5031882 
4505340 
13027795 
13027798 
13027796 
4757923 
4505824 
18390318 
4504698 
4758081 



1277 
629 
8038 
4016 
4680 
4320 
4312 
4322 
872 
5307 
25878 
3624 
1462 



Table_lb: List of 20 genes which are differentially expressed in non-responding tumors 
compared to tumors with at least a minor therapy assorted regression or normal healthy 
tissue. Reference is given to the SEQ ID NOs of the sequence listing. 



SEQ ID NO: SEQ ID NO: Gene Symbol 
(DNA (Protein 
Sequence) Sequence) 



472 


492 PRG1 


473 


493 GBP1 


474 


494 ALEX2 


475 


495 CD53 


476 


496 VCAM1 


477 


497 MAPT 


478 


498 EGR2 


479 


499 TD02 


480 


500 ADAMDEC1 


481 


501 TFEC 


.482 


502 BTF3 


483 


503 FLNB 


484 


504 TFRC 


485 


505 EIF4B 


486 


506 MAPK3 


487 


507 LOC161291 


488 


508 SLC1A1 


489 


509 MST4 


490 


510 BLAME 


491 


511 NME7 



Ref. 

Sequences 
[A] 

NM_002727 

NM_002053 

NM_014782 

NM_000560 

NM_001078 

NM_.005910 

NM_000399 

NM_005651 

NM_0 14479 

NM_0 12252 

NM_001207 

NM_001457 

NM_003234 

NM_001417 



NM_004170 
NM_016542 
NM_014036 
NM_013330 



UniGeneJD Locus Link I 
D 



1908 
62661 
48924 
82212 
109225 
101174 
1395 
183671 
145296 
113274 
101025 
81008 
77356 
93379 
861 
85335 
91139 
23643 
20450 
274479 ' 



5552 
■ 2633 
9823 
963 
7412 
4137 
1959 
6999 
27299 
22797 
689 
2317 
7037 
1975 
5595 
161291 
6505 
51765 
56833 
29922 
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TMe2: List of 47 preferred genes which differentially expressed in responders compared t, 
non responders or normal healthy tissue. Listed genes are preferred genes, e.g., for use in th, 
assessment whether or not a subject is expected to respond or not to respond to a given mod, 



of treatment. 

SEQ ID NO: 
(DNA 

Sequence) 

4 
5 
6 
7 
8 
11 
12 
13 
15 
22 
23 
24 
25 
26 
27 
29 
31 
32 
40 
43 
50 
51 
• 55 
58 
61 
65 
68 
69 
74 
81 
82 
83 
92 
98 
100 
101 
104 
105 
106 
108 
113 

124 
128 
132 
129 
133 
138 



SEQ ID NO: Gene Symbol Ref. 

(Protein Sequences 

Sequence) [A] 

169KPNA2 NM_002266 

170CSE1L NM_001316 

171 RHEB2 NM_005614 

172 DKC1 NMJJ01363 
173 IGFBP4 NM_001552 

176 HDAC2 • NM_001527 

1 77 PRKAB1 NM_006253 

178 IMPDH2 NM_000884 
180YR-29 NM_014866 
187CCNB2 NM_004701 

, 188 FMOD NM_002023 

189SLC7A8 NMJM2244 

190 E2-EPF NM_014501 

191 AGT NM_000029 

192 FHL2 NM_0014S0 
194 MGC 16824 NM_020314 
196MAD2L1 NM_002358 
197DDB2 NM_000107 
205 RARRES3 NM_004585 
208 COL1 1 A1 NM_001 854 

215 PCMT1 NM_005389 

216 ESR1 NMJD00125 
220 COL10A1 NM_000493 
223 GRP NM_002091 
226 GALNT10 NM_017540 
230 PGR NM_000926 

233 CCNG1 NM_004060 

234 PDHB NM_000925 
239 NR2F1 NM_005654 

246 FHL1 NM_001449 

247 MSX2 NM_002449 

248 PAI-RBP1 NMJD15640 
257 MKI67 NM_002417 
263 G NG 1 2 NM_01 8841 
265LOC253782 AL080192 
266 COL5A1 NM_000093 

269 KIAA0056 029954 

270 FLJ22642 AI700633 



Gene ID 



Locus_Link I 
D 



4504896 


3838 


18591914 


1434 


18600748 


6009 


15011921 


1736 


10835020 


3487 


4557640 


3066 


18602783 


5564 


4504688 


3615 


7662676 


10412 


10938017 


9133 


18548671 


2331 


14751202 


23428 


7657045 


27338 


4557286 


183 


4503722 


2274 


10092674 


57020 


6466452 


4085 


4557514 


1643 


8051633 


5920 


18548530 


1301 


4885538 


. 5110 


4503602 


2099 


18105031 ' 


1300 


4504158 


2922 


9055207 


55568 


4505766 


5241 




900 


4505686 


5162 


5032172 


7025 


4503720 


2273 


18560141 


• 4488 


. 7661625 


26135 


4505188 


4288 




'55970 




253782 


18571690 


1289. 



18578675 



23310 



271 LOC1 13146 


W28438 


15300131 


113146 


273 PMSCL1 


NMJ505033 


4826921 


5393 


278 DKF2p586M AL050227 




0723 








289 RAB31 


NM_006868 


5803130 


11031 


293 DCTN4 


NMJ)16221 


14733974 


51164 


297 PLSCR4 


NMJ020353 


9966818 


57088 


294 FLJ20273 


NMJ)19027 


9506670 


54502 


298 FLJ11323 


NMJ01839O 


8922994 


55344 


303 MGC3103 


NMJ)24036 


13128987 


78999 
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labl^3: Relative expression of 165 genes in complete responders as compared to non- 
responders and normal tissue. (CR - complete responder to therapy; 

NC - no change in tumor state; NT - normal healthy tissue) 



SEQ ID 
(DNA 
Sequence) 



NO: SEQ ID 
(Protein 
Sequence) 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 

32 . 

33 

34 

35 . 

36 

37 

38 

39 

40 

41 

42 

43 

44 



NO: Gene_Symbol CR_vs.JvJC CR_vs_NT NC_vs_NT 



166 CTSB 

167 SSR1 
168.STX8 

169 KPNA2 

170 CSE1L 

171 RHEB2 

172 DKC1 

173 IGFBP4 

174 SMC1L1 

175 PWP1 

176 HDAC2 

177 PRKAB1 

178 IMPDH2 

179 UBE2A 

180 YR-29 

181 MUF1 

182 MYO10 

183 EGFR 

184 IFRD1 

185 CD2BP2 

186 ARL3 

187 CCNB2 

188 FMOD 

189 SLC7A8 

190 E2-EPF 

191 AGT 

192 FHL2 

193 LDLC 

194 MGC16824 

195 UGDH 

196 MAD2L1 

197 DDB2 

198 OS4 

199 BCL2 

200 SEMA3C 

201 DTR 

202 GARP 

203 ACK1 

204 EDG2 

205 RARRES3 

206 CCNH 

207 PREP 

208 COL11A1 

209 GALC 



1.69033759 
1.69676002 
1.42795315 
2.10809096 
2.00249838 
1.84519193 
2.25597289 
0.27862606 
1.69816116 
0.64477544 
3.14799689 
0.52384682 
0.43342682 
1.56667644 
0.51635771 
1.48621121 
2.64854259 
1.84523855 
2.34518159 
0.40973605 
0.46877208 
2.94729142 
0.33346407 
0.23327957 
2.50218494 
0.38629467 
0.31699809 
0.56234146 
0.51520913 
0.4487715 
4.48217081 
0.37904516 
0.64290847 
0.37660415 
0.5199821 
7.22480411 
0.47456604 
0.52564876 
0.71655585 
0.24142196 
0.55809994 
1.84855753 
0.6377322 
0.50650838 



2.53990608 
1.56735024 
1.65931125 
2.08540708 
2.79008752 
1.60184035 
2.3855889 
0.38691248 
1.71849631 
0.59496475 
2.11008385 
0.56333165 
0.53415121 
1.8748269 
0.3928245 
1.67042393 
1.9657171 
0.3988927 
0.67841153 
0.74398402 
0.81409499 
5.81162556 
0.24429053 
0.68038164 
4.49667635 
0.52277847 
0.39190285 
0.88888889 
0.67362665 
0.59229116 
6.89647789 
0.3243275 
0.50896135 
0.26111358 
0.48877024 
0.4189956 
0.3525155 
0.49278642 
0.46969319 
1.41881212 
0.42039831 
1.63361667 
30.5047541 
0.63980608 



1.50260284 
0.92373125 
1.16202079 
0.98923961 
1.39330326 
0.86811584 
1.0574546 
1.38864428 
1.01197481 
0.92274723 
0.67029413 
1.07537477 
1.23239078 
1.19669056 
0.7607604 
1.12394787 
0.74218822 
0.21617406 
0.28927889 
1.81576414 
1.73665419 
1.97185304 
0.73258426 
2.91659333 
1.79709992 
1.35331525 
1.23629407 
1.58069244 
1.30748198 
1.31980566 
1.53864683 
0.85564341 
0.79165444 
0.69333698 
0.93997512 
0.05799404 
0.74281654 
0.93748232 
0.6554872 
5.87689745 
0.75326706 
0.88372509 
47.8331723 
1.26316978 



-99 



210 HMGCS2 

211 2NF274 
212TFF1 
213RAD51 

214 ASNS 

215 PCMT1 
216ESR1 

217 ACAT1 

218 XPA 
219LAF4 

220 COL10A1 

221 KIAA1041 

222 PLA2G7 

223 GRP 

224 CYP2B6 

225 CHAD 

226 GALNT10 

227 GADD45B 

228 WBSCR20 

229 BTBD2 

230 PGR 

231 TBPL1 

232 C4B 

233 CCNG1 

234 PDHB 

235 HNRPDL 
236TAF11 

237 AMACR 

238 EMD 

239 NR2F1 

240 HSF2 

241 SPG4 

242 TRIP11 

243 OCLN 

244 CACNA1D 

245 CYP2B7 

246 FHL1 

247 MSX2 
248PAI-RBP1 
249CLDN14 

250 ITPK1 

251 ERBB2 

252 TP53 

253 HSPA2 

254 LIG1 

255 GSS 

256 PR01843 

257 MKI67 

258 BIK 

259 K1AA0225 

260 TNRC15 

261 SFRS5 

262 RPL17 

263 GNG12 

264 LAP1B 

265 LOC253782 

266 COL5A1 



0.04797018 
1.70500973 
0.0321807 
3.1036169 
3.60284107 
2.46691568 
0.12287491 
0.51017664 
0.51539825 
0.23519327 
0.38555774 
1.44589009 
4.23491725 
0.12594309 
0.01213194 
0.02707726 
0.32020561 
0.51944741 
1.61337697 
0.59662324 
0.06700908 
1.71529386 
0.12173232 
0.46882525 
0.48347992 
0.62657647 
1.83477376 
0.61312794 
1.6831552 
0.2644984 
1.72328808 
2.02820496 
0.63637488 
0.47955471 
0.16768932 
0.01399196 
0.30932043 
0.26991798 
2.81808253 
0.34578658 
0.59689657 
1.86323083 
0.51575976 
0.09735986 
0.3244685 
0.58258632 
0.57531505 
2.0943328 
0.50587875 
2.13074615 
0.63566173 
0.55670226 
0.67408803 
0.39809519 
0.59182478 
0.33656287 
0.48612506 



0.03074921 
0.86640362 
0.2064045 
2.89007176 
2.12910917 
1.76150989 
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Table6: Statistical relevance of 20 genes differently in non-respqnders (NQ as compared 
to responding tumors . (CR - complete responder to therapy) 



SEQ ID NO: SEQ ID 
(DNA (Protein 
Sequence) Sequence) 

472 

473 

474 

475 

476 

477 

478 

479 

480 

481 

482 

483 

484 

485 

486 

487 

488 

489 

490 

491 • 



NO: Gene_Symbo! 

492 PRG1 

493 GBP1 

494 ALEX2 

495 CD53 

496 VCAM1 

497 MAPT 

498 EGR2 

499 TD02 

500 AD AMD EC 1 

501 TFEC 

502 BTF3 

503 FLNB 

504 TFRC 

505 EIF4B 

506 MAPK3 

507 LOC161291 

508 SLC1A1 

509 MST4 

510 BLAME 

511 NME7 



T-Test 
p-value 

0.0002116 
0.0020070 
0.0003502 
0.0019770 
0.0010630 
0.0005838 
■ 0.0008870 
0.0084350 
0.0018700 
0.0085550 
0.0001140 
0.0006050 
0.0005408 
0.0013130 
0.0001388 
0.0015790 
0.0000179 
0.0000888 
0.0048620 
0.0020950 



Welch-Test Wilcoxon 
p-value p-value 



0.0002631 

0.0023060 

0.0012570 

0.0039540 

0.0010690 

0.0007540 

0.0009158 

0.0105000 

0.0021870 

0.0155500 

0.0001471 

0.0007720 

0.0010110 

0.0013330 

0.0003527 

0.0031610 

0.0000389 

0.0000904 

0.0081110 

0.0021980 



0.0003108 

0.0029530 

0.0001554 

0.0018650 

0.0018650 

0.0001554 

0.0006216 

0.0018650 

0.0029530 

0.0010880 

0.0003108 

0.0018650 

0.0010880 

0.0006216 

0.0006216 

0.0006216 

0.0001554 

0.0001554 

0.0029530 

0.0006216 
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claims 3 0. Juni 200*1 

1 Method for characterizing the state of a neoplastic disease in a subject, comprising 

(i) determining the pattern of expression levels of at least 6, 8, 10, 15, 20, 30, or 4" 
• marker genes, comprised in a group of marker genes consisting of SEQ ID NO: 1 fa 

165, in a biological sample from said subject, 

(ii) comparing the pattern of expression levels determined in (i) with one or severa 
reference pattem(s) of expression levels, 

(iii) characterizing the state of said neoplastic disease in said subject from the outcom. 
of the comparison in step (ii). 

2 Method for characterizing the state of a neoplastic disease in a subject, comprising 

(i) determining the pattern of expression levels of at least 6, 8, 1 0, 15, 20, 30, 47 or 6'. 
marker genes, comprised in a group of marker genes consisting of SEQ ID NO:l t< 
165 and 472 to 491, in a biological sample from said subject, 

(ii) comparing the pattern of expression levels determined in (i) with one or severa 
reference pattem(s) of expression levels, 

(iii) characterizing the state of said neoplastic disease in said subject from the outcon* 
of the comparison in step (ii). 

3 Method for detection, diagnosis, screening, monitoring, and/or prognosis of a neoplastic 
disease in a subject, comprising 

(i) determining the pattern of expression levels of at least 1, 2, 3, 5, 10, 15, 20, 30, oi 
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8 
9 
10 



Meftod for detect™, diagI , Mis , SCIeening _ ^ rf g ^ 

disease in a subject, comprising 

(0 determining the pattern of expression levels of at least 1, 2, 3, 5, 10, 15 20 30 47 
or 67 marker genes, comprised in a group of marker genes consisting of SEQ ID 
NOs:l to 17, 19 to 33, 35 to 50, 52 to 64, 66 to 85, 88 to 91, and 93 to 165 and 472 
to 491 in biological samples from said subject, 

(ii) comparing the pattern of expression levels determined in (i) with one or several 
reference pattem(s) of expression levels, 

(iii) detecting, diagnosing, screening, monitoring, and/or prognosing said neoplastic 
. disease in said subject from the outcome of the comparison in step (ii). 

Method of any ofclaims 1 to 4, wherein said method comprises multiple determinations of 
a pattern of expression levels, at different points in time, thereby allowing to monitor the 
development of said neoplastic disease in said subject. 

Method of claim 1 or 2, wherein said method comprises an estimation of the likelihood of 
success of a given mode of treatment for said neoplastic disease in said subject. 

Method of claim 1 or 2, wherein said method comprises an assessment of whether or not 
the subject is expected to responds a given mode of treatment for said neoplastic disease. 
Method of claim 6 or 7, wherein a predictive algorithm is used. 
Method of claim 8, wherein the predictive algorithm is a Support Vector Machine. 
Method of any of claims 6 to 9, wherein said given mode of treatment 
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(i) identifying the most promising mode of treatment with the method of claim 6 or 7, 

(i i) treating said neoplastic disease in said patient by the mode of treatment identifiec 
in step (i). 

12 Method of screening for subjects afflicted with a neoplastic disease, wherein a method o: 
any of claims 1 to 4 is applied to a plurality of subjects. 

13 Method of screening for substances and/or therapy modalities having curative effect on i 
neoplastic disease comprising 



(0 
(ii) 



obtaining a biological sample from a subject afflicted with said neoplastic disease, 

assessing, from said biological sample, using the method of claim 6 or 7, whetha 
said subject is expected to respond to a given mode of treatment for said neoplastic 
disease, 



(iii) if said subject is expected to respond to said given mode of treatment, incubating 
said biological sample with said substance under said therapy modalities, • 

(iv) observing changes in said biological sample triggered by said test substance undei 
said therapy modalities, 



(v) selecting or rejecting said test substance and/or said therapy modalities, based 
the observation of changes in said biological sample under (iv). 



or 



14 Method of screening for compounds having curative effect on a neoplastic diseast 
comprising 

(i) incubating biological samples or extracts of these with a test substance, 
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Method of ^enfag for compounds having curaBve eff „. m , 
comprising • on a ne °P'astic disease 

(0 



mcobaling b,o.ogica( sampfes or ex*** of these wi «h a tKt mtmm¥ ^ 
00 delerminmg me paoern ofexpression >eve,s of a, ,eas, 1. 2> 3, 5, ,0, 15 , 20 , 30, 47 

no rr.r:;^" b a — ° f — — — <* ^ » 
(ii0 of ~ n leve,s de — * - - - — 

(iv) s^dng « ^ sajd te( ^ ^ Qn ^ ^ 

ITZnl^t 15 — * — - - — . * a. groop of 

MemodofanvofcUims , to 1«, "herein me expression .eve! is dcrmined 

(i) with a hybridization based method, or 

(ii) with a hybridization based method utilizing arrayed probes, or 
(iv) by real time real time PCR, or 

M by assessing the expression of po.ypepddes, proteins or derivatives ftereof, or 
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A kit comprising at least 6, 8, 10, 15, 20, 30, 47, or 67 primer pairs and probes suitable for 
marker genes comprised in a group of marker genes consisting of 

(i) SEQIDNO:l to SEQ ID NO: 165, and/or 

(ii) SEQIDNO:472toSEQIDNO:491,or 

(iii) the marker genes listed in Table 2. 

A kit comprising at least 6, 8, 10, 15, 20, 30, or 47 individually labeled probes, each 
having a sequence comprised in a group of sequences consisting of SEQ ID NO:331 to 
SEQIDNO:471. 

A kit comprising af least 6, 8, 10, 15, 20, 30, 47 or 67 individually labeled probes, each 
having a sequence comprised in a group of sequences consisting of SEQ ID NO:331 to 
SEQ ID NO:471 and SEQ ID NO:512 to 571. 

A kit comprising at least 6, 8, 10, 15, 20, 30, or 47 arrayed probes, each having a sequence 
comprised in a group of sequences consisting of SEQ ID NO:33 1 to SEQ ID NO:471. 

A kit comprising at least 6, 8, 10, 15, 20, 30 ; 47 or 67 arrayed probes, each having a 
sequence . comprised in a group of sequences consisting of SEQ ID NO:331 to SEQ ID 
NO:471 and SEQ ID NO:512 to 571. 
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METHODS AND KITS FOR INVESTIGATING CANCER 
ABSTRACT OF THE DISCLOSURE 

The invention "provides novel compositions, methods and uses, for the prediction, diagnosis 
prognosis, prevention and treatment of malignant neoplasia and breast cancer. The inventioi 
further relates to genes that are differentially expressed in breast tissue of breast cancer patient 
versus those of normal "healthy" tissue. Differentially expressed genes for the identification o 
patients which are likely to respond to chemotherapy are also provided. 
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